[HACKERS] [RFC] LSN Map
Hi Hackers, In order to make incremental backup (https://wiki.postgresql.org/wiki/Incremental_backup) efficient we need a way to track the LSN of a page in a way that we can retrieve it without reading the actual block. Below there is my proposal on how to achieve it. LSN Map --- The purpose of the LSN map is to quickly know if a page of a relation has been modified after a specified checkpoint. Implementation -- We create an additional fork which contains a raw stream of LSNs. To limit the space used, every entry represent the maximum LSN of a group of blocks of a fixed size. I chose arbitrarily the size of 2048 which is equivalent to 16MB of heap data, which means that we need 64k entry to track one terabyte of heap. Name I've called this map LSN map, and I've named the corresponding fork file as lm. WAL logging --- At the moment the map is not wal logged, but is updated during the wal reply. I'm not enough deep in WAL mechanics to see if the current approach is sane or if we should change it. Current limits -- The current implementation tracks only heap LSN. It currently does not track any kind of indexes, but this can be easily added later. The implementation of commands that rewrite the whole table can be improved: cluster uses shared memory buffers instead of writing the map directly on the disk, and moving a table to another tablespace simply drops the map instead of updating it correctly. Further ideas - The current implementation updates an entry in the map every time the block get its LSN bumped, but we really only need to know which is the first checkpoint that contains expired data. So setting the entry to the last checkpoint LSN is probably enough, and will reduce the number of writes. To implement this we only need a backend local copy of the last checkpoint LSN, which is updated during each XLogInsert. Again, I'm not enough deep in replication mechanics to see if this approach could work on a standby using restartpoints instead of checkpoints. Please advice on the best way to implement it. Conclusions This code is incomplete, and the xlog reply part must be improved/fixed, but I think its a good start to have this feature. I will appreciate any review, advice or critic. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it From 89a943032f0a10fd093c126d15fbf81e5861dbe3 Mon Sep 17 00:00:00 2001 From: Marco Nenciarini marco.nenciar...@2ndquadrant.it Date: Mon, 3 Nov 2014 17:52:27 +0100 Subject: [PATCH] LSN Map This is a WIP. Only heap is supported. No indexes, no sequences. --- src/backend/access/heap/Makefile | 2 +- src/backend/access/heap/heapam.c | 239 ++-- src/backend/access/heap/hio.c | 11 +- src/backend/access/heap/lsnmap.c | 336 ++ src/backend/access/heap/pruneheap.c | 10 + src/backend/access/heap/rewriteheap.c | 37 +++- src/backend/catalog/storage.c | 8 + src/backend/commands/tablecmds.c | 5 +- src/backend/commands/vacuumlazy.c | 35 +++- src/backend/storage/smgr/smgr.c | 1 + src/common/relpath.c | 5 +- src/include/access/hio.h | 3 +- src/include/access/lsnmap.h | 28 +++ src/include/common/relpath.h | 5 +- src/include/storage/smgr.h| 1 + 15 files changed, 687 insertions(+), 39 deletions(-) create mode 100644 src/backend/access/heap/lsnmap.c create mode 100644 src/include/access/lsnmap.h diff --git a/src/backend/access/heap/Makefile b/src/backend/access/heap/Makefile index b83d496..776ee7d 100644 *** a/src/backend/access/heap/Makefile --- b/src/backend/access/heap/Makefile *** subdir = src/backend/access/heap *** 12,17 top_builddir = ../../../.. include $(top_builddir)/src/Makefile.global ! OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o include $(top_srcdir)/src/backend/common.mk --- 12,17 top_builddir = ../../../.. include $(top_builddir)/src/Makefile.global ! OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o lsnmap.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 21e9d06..9486562 100644 *** a/src/backend/access/heap/heapam.c --- b/src/backend/access/heap/heapam.c *** *** 48,53 --- 48,54 #include access/tuptoaster.h #include access/valid.h #include access/visibilitymap.h + #include access/lsnmap.h #include access/xact.h #include access/xlog.h #include access/xloginsert.h *** heap_insert(Relation relation, HeapTuple *** 2067,2073 TransactionId xid = GetCurrentTransactionId(); HeapTuple heaptup; Buffer buffer; !
Re: [HACKERS] XLOG_PARAMETER_CHANGE handling of wal_log_hints
On 07/01/15 00:59, Michael Paquier wrote: On Wed, Jan 7, 2015 at 4:24 AM, Petr Jelinek p...@2ndquadrant.com wrote: Hi, when I was fixing how commit_ts handles the XLOG_PARAMETER_CHANGE I noticed that for wal_log_hints we assign the value in ControFile to current value instead of value that comes from WAL. ISTM it has just information value so it should not have any practical impact, but it looks like a bug anyway so here is simple patch to change that. Right. That's something that should get fixed in 9.4 as well.. ControlFile-track_commit_timestamp = track_commit_timestamp; The new value of track_commit_timestamp should be taken as well from xlrec on master btw. That's part of the larger fix for CommitTs that I sent separately in response to the bug report from Fujii - there is much more to be done there for CommitTs than just this. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v3: incremental PoC
Il 06/01/15 14:26, Robert Haas ha scritto: I suggest leaving this out altogether for the first version. I can think of three possible ways that we can determine which blocks need to be backed up. One, just read every block in the database and look at the LSN of each one. Two, maintain a cache of LSN information on a per-segment (or smaller) basis, as you suggest here. Three, scan the WAL generated since the incremental backup and summarize it into a list of blocks that need to be backed up. This last idea could either be done when the backup is requested, or it could be done as the WAL is generated and used to populate the LSN cache. In the long run, I think some variant of approach #3 is likely best, but in the short run, approach #1 (scan everything) is certainly easiest. While it doesn't optimize I/O, it still gives you the benefit of reducing the amount of data that needs to be transferred and stored, and that's not nothing. If we get that much working, we can improve things more later. Hi, The patch now uses the approach #1, but I've just sent a patch that uses the #2 approach. 54ad016e.9020...@2ndquadrant.it Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it signature.asc Description: OpenPGP digital signature
Re: [HACKERS] parallel mode and parallel contexts
On 6 January 2015 at 21:37, Simon Riggs si...@2ndquadrant.com wrote: I get it now and agree Yes, very much. Should we copy both the top-level frame and the current subxid? Hot Standby links subxids directly to the top-level, so this would be similar. If we copied both, we wouldn't need to special case the Get functions. It would still be O(1). but please work some more on clarity of README, comments and variable naming! Something other than top sounds good. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
On 6 January 2015 at 20:44, Peter Geoghegan p...@heroku.com wrote: Another issue that I see is that there is only one resultRelation between the INSERT and UPDATE. That means that without some additional special measure, both UPDATE and INSERT WITH CHECK ( ... ) options are applied regardless of whether the INSERT path was taken, or the UPDATE path. Maybe that's actually sensible (note that even this doesn't happen right now, since the auxiliary UPDATE is currently minimally processed by the rewriter). What do people think about that? It seems like it might be less surprising to have that fail earlier when there are separate policies for INSERT and UPDATE -- for UPSERT, all policies are applied regardless of which path is taken. I think the policies applied should depend on the path taken, so if it does an INSERT, then only the INSERT CHECK policy should be applied (after the insert), but if it ends up doing an UPDATE, I would expect the UPDATE USING policy to be applied (before the update) and the UPDATE CHECK policy to be applied (after the update). I would not expect the INSERT CHECK policy to be applied on the UPDATE path. As to whether the UPDATE USING policy should cause an error to be thrown if it is not satisfied, my inclination would be to not error, and use the command tag to report that no rows were updated, since that is what would happen with a regular UPDATE. So overall INSERT .. ON CONFLICT UPDATE would be consistent with either an INSERT or an UPDATE, depending on whether the row existed beforehand, which is easier to explain than having some special UPSERT semantics. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs
2015-01-05 Tom Lane t...@sss.pgh.pa.us: What would make sense to me is to teach the planner about inlining SQL functions that include ORDER BY clauses, so that the performance issue of a double sort could be avoided entirely transparently to the user. Another way of getting to the point where the extra check-node is not needed in obvious cases, would be: * Apply the current patch in some form. * Later, add code that analyzes the query inside the function. If it turns out that the result of the analysis implies the declared order, don't add the check-node. The analysis can in principle also be performed for other languages, but that would most likely be way more complex for the typical Turing complete languages. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactions involving multiple postgres foreign servers
On Sat, Jan 3, 2015 at 2:15 AM, Tom Lane t...@sss.pgh.pa.us wrote: Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes: While looking at the patch for supporting inheritance on foreign tables, I noticed that if a transaction makes changes to more than two foreign servers the current implementation in postgres_fdw doesn't make sure that either all of them rollback or all of them commit their changes, IOW there is a possibility that some of them commit their changes while others rollback theirs. PFA patch which uses 2PC to solve this problem. In pgfdw_xact_callback() at XACT_EVENT_PRE_COMMIT event, it sends prepares the transaction at all the foreign postgresql servers and at XACT_EVENT_COMMIT or XACT_EVENT_ABORT event it commits or aborts those transactions resp. TBH, I think this is a pretty awful idea. In the first place, this does little to improve the actual reliability of a commit occurring across multiple foreign servers; and in the second place it creates a bunch of brand new failure modes, many of which would require manual DBA cleanup. The core of the problem is that this doesn't have anything to do with 2PC as it's commonly understood: for that, you need a genuine external transaction manager that is aware of all the servers involved in a transaction, and has its own persistent state (or at least a way to reconstruct its own state by examining the per-server states). This patch is not that; in particular it treats the local transaction asymmetrically from the remote ones, which doesn't seem like a great idea --- ie, the local transaction could still abort after committing all the remote ones, leaving you no better off in terms of cross-server consistency. As far as failure modes go, one basic reason why this cannot work as presented is that the remote servers may not even have prepared transaction support enabled (in fact max_prepared_transactions = 0 is the default in all supported PG versions). So this would absolutely have to be a not-on-by-default option. Agreed. We can have a per foreign server option, which says whether the corresponding server can participate in 2PC. A transaction spanning multiple foreign server with at least one of them not capable of participating in 2PC will need to be aborted. But the bigger issue is that leaving it to the DBA to clean up after failures is not a production grade solution, *especially* not for prepared transactions, which are performance killers if not closed out promptly. So I can't imagine anyone wanting to turn this on without a more robust answer than that. I purposefully left that outside this patch, since it involves significant changes in core. If that's necessary for the first cut, I will work on it. Basically I think what you'd need for this to be a credible patch would be for it to work by changing the behavior only in the PREPARE TRANSACTION path: rather than punting as we do now, prepare the remote transactions, and report their server identities and gids to an external transaction manager, which would then be responsible for issuing the actual commits (along with the actual commit of the local transaction). I have no idea whether it's feasible to do that without having to assume a particular 2PC transaction manager API/implementation. I doubt if a TM would expect a bunch of GIDs in response to PREPARE TRANSACTION command. Per X/Open xa_prepare() expects an integer return value, specifying whether the PREPARE succeeded or not and some piggybacked statuses. In the context of foreign table under inheritance tree, a single DML can span multiple foreign servers. All such DMLs will then need to be handled by an external TM. An external TM or application may not have exact idea as to which all foreign servers are going to be affected by a DML. Users may not want to setup an external TM in such cases. Instead they would expect PostgreSQL to manage such DMLs and transactions all by itself. As Robert has suggested in his responses, it would be better to enable PostgreSQL to manage distributed transactions itself. It'd be interesting to hear from people who are using 2PC in production to find out if this would solve any real-world problems for them, and what the details of the TM interface would need to look like to make it work in practice. In short, you can't force 2PC technology on people who aren't using it already; while for those who are using it already, this isn't nearly good enough as-is. regards, tom lane -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Transactions involving multiple postgres foreign servers
On Tue, Jan 6, 2015 at 11:55 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Jan 5, 2015 at 3:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, we intentionally didn't couple the FDW stuff closely into transaction commit, because of the thought that the far end would not necessarily have Postgres-like transactional behavior, and even if it did there would be about zero chance of having atomic commit with a non-Postgres remote server. postgres_fdw is a seriously bad starting point as far as that goes, because it encourages one to make assumptions that can't possibly work for any other wrapper. Atomic commit is something that can potentially be supported by many different FDWs, as long as the thing on the other end supports 2PC. If you're talking to Oracle or DB2 or SQL Server, and it supports 2PC, then you can PREPARE the transaction and then go back and COMMIT the transaction once it's committed locally. Getting a cluster-wide *snapshot* is probably a PostgreSQL-only thing requiring much deeper integration, but I think it would be sensible to leave that as a future project and solve the simpler problem first. I think the idea I sketched upthread of supporting an external transaction manager might be worth pursuing, in that it would potentially lead to having at least an approximation of atomic commit across heterogeneous servers. An important threshold question here is whether we want to rely on an external transaction manager, or build one into PostgreSQL. As far as this particular project goes, there's nothing that can't be done inside PostgreSQL. You need a durable registry of which transactions you prepared on which servers, and which XIDs they correlate to. If you have that, then you can use background workers or similar to go retry commits or rollbacks of prepared transactions until it works, even if there's been a local crash meanwhile. Alternatively, you could rely on an external transaction manager to do all that stuff. I don't have a clear sense of what that would entail, or how it might be better or worse than rolling our own. I suspect, though, that it might amount to little more than adding a middle man. I mean, a third-party transaction manager isn't going to automatically know how to commit a transaction prepared on some foreign server using some foreign data wrapper. It's going to be have to be taught that if postgres_fdw leaves a transaction in-medias-res on server OID 1234, you've got to connect to the target machine using that foreign server's connection parameters, speak libpq, and issue the appropriate COMMIT TRANSACTION command. And similarly, you're going to need to arrange to notify it before preparing that transaction so that it knows that it needs to request the COMMIT or ABORT later on. Once you've got all of that infrastructure for that in place, what are you really gaining over just doing it in PostgreSQL (or, say, a contrib module thereto)? Thanks Robert for giving high level view of system needed for PostgreSQL to be a transaction manager by itself. Agreed completely. (I'm also concerned that an external transaction manager might need the PostgreSQL client to be aware of it, whereas what we'd really like here is for the client to just speak PostgreSQL and be happy that its commits no longer end up half-done.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] pg_rewind in contrib
On 01/07/2015 01:54 AM, Andrew Dunstan wrote: I also think it's a great idea. But I think we should consider the name carefully. pg_resync might be a better name. Strictly, you might not be quite rewinding, AIUI. pg_resync sounds too generic. It's true that if the source server has changes of its own, then it's more of a sideways movement than rewinding, but I think it's nevertheless a good name. It does always rewind the control file, so that after startup, WAL replay begins from the last common point in history between the servers. WAL replay will catch up with the source server, which might be ahead of last common point, but strictly speaking pg_rewind is not involved at that point anymore. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind in contrib
On 01/07/2015 02:17 AM, Andres Freund wrote: On 2015-01-06 15:39:29 -0500, Peter Eisentraut wrote: It wouldn't hurt if we could share SimpleXLogPageRead() between pg_xlogdump and pg_rewind as the differences are more or less superficial, but that seems simple enough to achieve by putting a frontend variant in xlogreader.c/h. It's not that much code. And although they're almost identical now, it's not clear that pg_rewind and pg_xlogdump would want the same behaviour in the future. pg_rewind might want to read files from a WAL archive, for example. (Not that I have any plans for that right now) If this ends up shipping, it's going to be a massively popular tool. I see it as a companion to pg_basebackup. So it should sort of work the same way. One problem is that it doesn't use the replication protocol, so the setup is going to be inconsistent with pg_basebackup. Maybe the replication protocol could be extended to provide the required data. I'm not particularly bothered by the requirement of also requiring a normal, not replication, connection. In most cases that'll already be allowed. Yeah, it's not a big problem in practice, but it sure would be nice for usability if it wasn't required. One less thing to document and prepare for. Maybe something as simple as give me this file would work. Well, looking at libpq_fetch.c it seems there'd be a bit more required. Not having to create a temporary table on the other side would be nice - afaics there's otherwise not much stopping this from working against a standby? Hmm, that could be done. The temporary table is convenient, but it could be refactored to work without it. That might lose the local copy mode, but how important is that? pg_basebackup doesn't have that mode. But we have plain pg_start/stop_backup for that case. That alternative doesn't really exist here. Also, the local mode works when the server is shut down. The pg_basebackup analogue of that is just cp -a $PGDATA backup. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind in contrib
On 01/06/2015 10:39 PM, Peter Eisentraut wrote: If this ends up shipping, it's going to be a massively popular tool. I see it as a companion to pg_basebackup. So it should sort of work the same way. One problem is that it doesn't use the replication protocol, so the setup is going to be inconsistent with pg_basebackup. Maybe the replication protocol could be extended to provide the required data. Maybe something as simple as give me this file would work. Yeah, that would be nice. But I think we can live with it as it is for now, and add that later. That might lose the local copy mode, but how important is that? pg_basebackup doesn't have that mode. In any case, the documentation doesn't explain this distinction. The option documentation is a bit short in any case, but it's not clear that you can choose between local and remote mode. Changing the libpq mode to use additional replication protocol commands would be a localized change to libpq_fetch.c. No need to touch the local copy mode. The test suite should probably be reimplemented in Perl. (I might be able to help.) Again, ingenious, but it's very hard to follow the sequence of what is being tested. And some Windows person is going to complain. ;-) Yeah, totally agreed. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fillfactor for GIN indexes
On Wed, Dec 3, 2014 at 2:37 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Nov 28, 2014 at 4:27 AM, Alexander Korotkov aekorot...@gmail.com wrote: On Fri, Nov 21, 2014 at 8:12 AM, Michael Paquier michael.paqu...@gmail.com wrote: Please find attached a simple patch adding fillfactor as storage parameter for GIN indexes. The default value is the same as the one currently aka 100 to have the pages completely packed when a GIN index is created. That's not true. Let us discuss it a little bit. [blah discussion] That's quite a nice explanation. Thanks! My summary is following: 1) In order to have fully correct support of fillfactor in GIN we need to rewrite GIN build algorithm. 2) Without rewriting GIN build algorithm, not much can be done with entry tree. However, you can implement some heuristics. TBH, I am not really planning to rewrite the whole code. 3) You definitely need to touch code that selects ratio of split in dataPlaceToPageLeaf (starting with if (!btree-isBuild)). OK I see, so for a split we need to have a calculation based on the fillfactor, with 75% by default. 4) GIN data pages are always compressed excepts pg_upgraded indexes from pre-9.4. Take care about it in following code. if (GinPageIsCompressed(page)) freespace = GinDataLeafPageGetFreeSpace(page); + else if (btree-isBuild) + freespace = BLCKSZ * (100 - fillfactor) / 100; Hm. Simply reversing both conditions is fine, no? This is a very interesting explanation; thanks for writing it up! It does leave me wondering why anyone would want fillfactor for GIN at all, even if they were willing to rewrite the build algorithm. Based on the explanation of Alexander, the current GIN code fills in a page at 75% for a split, and was doing even 50/50 pre-9.4 if I recall correctly. IMO a higher fillfactor makes sense for a GIN index that gets less random updates, no? I am attaching an updated patch, with the default fillfactor value at 75%, and with the page split code using the fillfactor rate. Thoughts? -- Michael From 39c6cab320f648b61cc65654ae67e62d8926bfa1 Mon Sep 17 00:00:00 2001 From: Michael Paquier michael@otacoo.com Date: Fri, 21 Nov 2014 14:08:54 +0900 Subject: [PATCH] Support fillfactor for GIN indexes Users can call this new storage parameter to fill in the entry and leaf pages of a newly-built index as wanted. Fillfactor range varies between 20 and 100. --- doc/src/sgml/ref/create_index.sgml | 4 ++-- src/backend/access/common/reloptions.c | 9 + src/backend/access/gin/gindatapage.c | 34 -- src/backend/access/gin/ginentrypage.c | 20 +++- src/backend/access/gin/ginutil.c | 3 ++- src/include/access/gin_private.h | 3 +++ 6 files changed, 59 insertions(+), 14 deletions(-) diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index 6b2ee28..c0ba24a 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -294,8 +294,8 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] replaceable class= para The optional literalWITH/ clause specifies firsttermstorage parameters/ for the index. Each index method has its own set of allowed -storage parameters. The B-tree, hash, GiST and SP-GiST index methods all -accept this parameter: +storage parameters. The B-tree, hash, GIN, GiST and SP-GiST index methods +all accept this parameter: /para variablelist diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c index f008fab..418ecec 100644 --- a/src/backend/access/common/reloptions.c +++ b/src/backend/access/common/reloptions.c @@ -15,6 +15,7 @@ #include postgres.h +#include access/gin_private.h #include access/gist_private.h #include access/hash.h #include access/htup_details.h @@ -133,6 +134,14 @@ static relopt_int intRelOpts[] = }, { { + fillfactor, + Packs gin index pages only to this percentage, + RELOPT_KIND_GIN + }, + GIN_DEFAULT_FILLFACTOR, GIN_MIN_FILLFACTOR, 100 + }, + { + { autovacuum_vacuum_threshold, Minimum number of tuple updates or deletes prior to vacuum, RELOPT_KIND_HEAP | RELOPT_KIND_TOAST diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c index 8e81f6c..eebfa62 100644 --- a/src/backend/access/gin/gindatapage.c +++ b/src/backend/access/gin/gindatapage.c @@ -446,11 +446,21 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack, leafSegmentInfo *lastleftinfo; ItemPointerData maxOldItem; ItemPointerData remaining; + int fillfactor; Assert(GinPageIsData(page)); rbound = *GinDataPageGetRightBound(page); + /* Grab option values */ + if (btree-index-rd_options) + { + GinOptions *options = (GinOptions *) btree-index-rd_options; + fillfactor = options-fillfactor; + } + else + fillfactor = GIN_DEFAULT_FILLFACTOR; + /* * Count how many of the new items belong
Re: [HACKERS] parallel mode and parallel contexts
On Tue, Jan 6, 2015 at 4:37 PM, Simon Riggs si...@2ndquadrant.com wrote: So when you say Only the top frame of the transaction state stack is copied you don't mean the top, you mean the bottom (the latest subxact)? Which then becomes the top in the parallel worker? OK... The item most recently added to the stack is properly called the top, but I guess it's confusing in this case because the item on the bottom of the stack is referred to as the TopTransaction. I'll see if I can rephrase that. Those comments really belong in a README, not the first visible comment in xact.c OK. You need to start with the explanation that parallel workers have a faked-up xact stack to make it easier to copy and manage. That is valid because we never change xact state during a worker operation. OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind in contrib
On 01/06/2015 10:39 PM, Peter Eisentraut wrote: The test suite should probably be reimplemented in Perl. (I might be able to help.) Again, ingenious, but it's very hard to follow the sequence of what is being tested. And some Windows person is going to complain. ;-) I took a shot at rewriting the tests in perl. It works, but I would appreciate help in reviewing and fixing any bad perl I may have written. I have not added new tests or changed the things they test, this is just a straightforward translation from the mix of bash scripts and pg_regress to perl. - Heikki pg_rewind-contrib-3.patch.gz Description: application/gzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql -c does not honor ON_ERROR_STOP
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/06/2015 12:36 PM, Tom Lane wrote: -c submits the entire string to the backend in one PQexec(); therefore ON_ERROR_STOP cannot have any impact on its behavior. The backend will abandon processing the whole string upon first error, embedded begin/ commit commands notwithstanding. There's been repeated discussion of changing -c so that the string is split apart and processed more like it would be if it'd been read from stdin. However, given the number of ways you can already submit a string via stdin, this wouldn't be buying any new functionality. Even discounting backwards-compatibility considerations, such a change would make it completely impossible to test multiple-commands-per-PQexec scenarios using psql. So I'm inclined to leave it alone. Possibly the point is worth documenting explicitly, though. OK, thanks. I'll submit a proposal for an update to the docs. Joe - -- Joseph E Conway credativ LLC 270 E Douglas Ave El Cajon, CA 92020 Office: +1 619 270 8787 Mobile: +1 619 843 8340 === USA: http://www.credativ.us Canada: http://www.credativ.ca Germany: http://www.credativ.de Netherlands: http://www.credativ.nl UK: http://www.credativ.co.uk India: http://www.credativ.in === -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQIcBAEBAgAGBQJUrTo5AAoJEDfy90M199hl+7AP/jdG9IvJ6jH6GmjTQ8DLQZVu QiIJYpuE0A5Zr6bNukksX4ItDrvhuERwlLMQelTWhByYYh0CYEQf/4dPs2vPn+QG ZXRS6CE9YSQ+umab1O05Bdft8CqrQ682C12PxogInwy38NzmspTAbcGDq+Yve9Z4 EMpMN4Jz0sK77N0acTw/HqLNQMwaGSeUAh/1jKnKB85RKpLgkHmTpuQhrQwZ7mfY CJdUbyBUj9PGTHkEtUaRtUU9FF+KAax2OX6KDaC8neZ9YF1hMv4CkA5jOr8Hm2yL JQJCjjC1wCt5e4VN+/d0lNsBj7/CCLPpzy4rnOUEEJOCqe/n6dc65tC72B2zirTH 5gWHkI3rC4EeoWVm9oxutEmhG5ocLv7aTTr7ZUXwEPHk+U3MMbgFfrz1VF6z6Ymp t6LcWztBR+VLlXPiHwCSDNKenSimyDWACuIJcGArTf5BZmekRGpLW03bgnJkjXSZ IEXyqcAGw7Tac7t12cZ/f4sDsWWM+yOOIdq82SsXJ052P1patl5l1H1aU5Tek7WE qYwEFOMD8e2bCqOHm8Ij94ZymfN5eirscFL9fcM1QkHHgKsXh5ShUCTwtxHWYlMC yth50q7mdmrW3Lych1ry0bYBecjF/dOlgjv19hWHZXmuCXU+5uhwbJUzvHOzioiJ SfbED2GriwGDudEtgHnb =qdOf -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fillfactor for GIN indexes
On Thu, Jan 8, 2015 at 6:31 AM, Alexander Korotkov aekorot...@gmail.com wrote: On Wed, Jan 7, 2015 at 4:11 PM, Michael Paquier michael.paqu...@gmail.com I am attaching an updated patch, with the default fillfactor value at 75%, and with the page split code using the fillfactor rate. Thoughts? Rewritten version of patch is attached. I made following changes: Thanks! With this patch (and my previous version as well) GIN indexes with default fillfactor have a size higher than 9.4 indexes, 9.4 behavior being consistent only with fillfactor=100 and not the default of 90. Are we fine with that? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
On 01/07/2015 02:31 PM, Peter Eisentraut wrote: On 1/6/15 7:33 PM, Josh Berkus wrote: D. Wierd name: for those doing only replication, not PITR, recovery.conf is completely baffling. I don't disagree, but standby.enabled or whatever isn't any better, for the inverse reason. Yeah. Like I said, I posted a list of bugs and features so that we can test your solution and the pg.conf merger to see how much they actually improve things. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
On Wed, Jan 7, 2015 at 12:01 PM, Stephen Frost sfr...@snowman.net wrote: Other databases have this capability and have triggers and at least one ends up firing both INSERT and UPDATE triggers, with many complaints from users about how that ends up making the performance suck. Perhaps we could use that as a fallback but support the explicit single trigger option too.. Just some thoughts, apologies if it's already been convered in depth previously. I would like to expose whether or not statement-level triggers are being called in the context of an INSERT ... ON CONFLICT UPDATE, FWIW. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
On Wed, Jan 7, 2015 at 12:26 PM, Stephen Frost sfr...@snowman.net wrote: I agree with that up to a point- this case feels more like a POLA violation though rather than a run-of-the-mill you need to test all that. I'm a bit worried this might lead to cases where users can't use UPSERT because they don't know which set of policies will end up getting applied, although the above approach is strictly a subset of the approach I'm advocating, but at least with the 'throw it all together' case, you know exactly what you're getting with UPSERT. I think that it makes sense to bunch the policies together, because INSERT ... ON CONFLICT UPDATE is mostly an INSERT - informally speaking, the UPDATE part is just enough to resolve a conflict - this is ultimately just a new clause of INSERT. Besides, I think that making the permissions as restrictive as possible is the least surprising behavior. It's clearly the most secure behavior. I think that we collectively lean towards actively throwing an error as the preferred behavior - I think that's good, because the intent of the user to UPDATE some particular tuple that they're not allowed to UPDATE is clear and unambiguous. As long as we're doing that, clearly writing a query that will fail based on the wrong path being taken is wrong-headed. Now, you can construct an UPSERT case that a bunch together policies approach doesn't serve well. I could be mistaken, but I think that case is likely to be more than a little contrived. We're now going to be testing the TARGET.* tuple before going to update, having failed to insert (and after row locking the TARGET.* tuple) - the target tuple is really how the update finds the existing row to update, so it should be tested, without necessarily being blamed directly (certainly, we cannot leak the TARGET.* tuple values, so maybe we should try blaming the EXCLUDED.* tuple first for error messaging reasons). If it fails because of the insert check after row locking but before update, well, it was liable to anyway, when the insert path was taken. Conversely, if it failed after actually inserting where that would not happen with a vanilla INSERT (due to a bunched-together update USING() security barrier qual or explicit user-supplied CHECK OPTION), well, you'd have gotten the same thing if you took the UPDATE path anyway. That just leaves the regular ExecUpdate() call of ExecWithCheckOptions(). When that is called with bunched together policies, you're now testing the final tuple. So it might be a bit confusing that the policy of final tuples originating from INSERTs is also enforced here, for the final tuple of an UPDATE. But in order for even that to be the case, you'd have to have a final tuple that was substantially different from the one originally proposed for insertion that made the difference between a qual passing and not passing. How likely is that in realistic use cases? And besides, isn't the policy for INSERTs still quite relevant even here? We're talking about a final tuple that originated from an INSERT statement, after all. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] empty select list allowed when using function as table
Interestingly, the following query works (it didn't used to): select from generate_series(1, 1); Was this intentional? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] empty select list allowed when using function as table
On 2015-01-08 01:13, Merlin Moncure wrote: Interestingly, the following query works (it didn't used to): select from generate_series(1, 1); Was this intentional? See 1b4f7f93b4693858cb983af3cd557f6097dab67b .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible typo in create_policy.sgml
* Robert Haas (robertmh...@gmail.com) wrote: On Tue, Jan 6, 2015 at 4:07 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Jan 6, 2015 at 1:03 PM, Robert Haas robertmh...@gmail.com wrote: I thought my rewrite clarified this distinction pretty well. Maybe I'm wrong? We're talking about the same paragraph. Sorry, I didn't express myself clearly. I think that you did get it right, but I would like to see that distinction also reflected in the actual sgml PARAMETER class tag. expression is way too generic here. I'm happy to see us change that if it makes sense, but I'm not sure what that actually does. If I'm following correctly, Peter's specifically talking about: [ USING ( replaceable class=parameterexpression/replaceable ) ] [ WITH CHECK ( replaceable class=parametercheck_expression/replaceable ) ] Where the USING parameter is 'expression' but the WITH CHECK parameter is 'check_expression'. He makes a good point, I believe, as expression is overly generic. I don't like the idea of using barrier_expression though as that ends up introducing a new term- how about using_expression? This would need to be reflected below in the documentation which talks about expression as the parameter to USING, but I can certainly handle cleaning that up. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_basebackup fails with long tablespace paths
On Tue, Jan 6, 2015 at 4:33 PM, Peter Eisentraut pete...@gmx.net wrote: Currently, when you unpack a tarred basebackup with tablespaces, the symlinks will tell you whether you have unpacked the tablespace tars at the right place. Otherwise, how do you know? Secondly, you also have the option of putting the tablespaces somewhere else by changing the symlinks. That's a good argument for making the tablespace-map file human-readable and human-editable, but I don't think it's an argument for duplicating its contents inaccurately in the filesystem. One way to address this would be to do away with the symlinks altogether and have pg_tblspc/12345 be a text file that contains the tablespace location. Kind of symlinks implemented in user space. Well, that's just spreading the tablespace-map file out into several files, and maybe keeping it around after we've restored from backup. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible typo in create_policy.sgml
On Tue, Jan 6, 2015 at 4:07 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Jan 6, 2015 at 1:03 PM, Robert Haas robertmh...@gmail.com wrote: I thought my rewrite clarified this distinction pretty well. Maybe I'm wrong? We're talking about the same paragraph. Sorry, I didn't express myself clearly. I think that you did get it right, but I would like to see that distinction also reflected in the actual sgml PARAMETER class tag. expression is way too generic here. I'm happy to see us change that if it makes sense, but I'm not sure what that actually does. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
* Robert Haas (robertmh...@gmail.com) wrote: On Wed, Jan 7, 2015 at 4:04 AM, Dean Rasheed dean.a.rash...@gmail.com wrote: I think the policies applied should depend on the path taken, so if it does an INSERT, then only the INSERT CHECK policy should be applied (after the insert), but if it ends up doing an UPDATE, I would expect the UPDATE USING policy to be applied (before the update) and the UPDATE CHECK policy to be applied (after the update). I would not expect the INSERT CHECK policy to be applied on the UPDATE path. I agree. I can certainly understand the appeal of this approach, but I don't think it makes sense. Consider what happens later on down the road, after the code has been written and deployed using INSERT .. ON CONFLICT UPDATE where 99% of the time only one path or the other is taken. Then the other path is taken and suddenly the exact same command and row ends up returning errors. Additional testing should have been done to check if that happens, of course, but I really don't like the idea that the exact same command, with the exact same policies, would succeed or fail, due to policies, based on the data in the database. That's not to say that, generally, speaking, the data in the database shouldn't impact policy failures, but in this case, the policies might not even reference any other tables in the database. As to whether the UPDATE USING policy should cause an error to be thrown if it is not satisfied, my inclination would be to not error, and use the command tag to report that no rows were updated, since that is what would happen with a regular UPDATE. My inclination would be to error, but I'd be OK with your proposal. I'm pretty strongly in favor of erroring. :) So overall INSERT .. ON CONFLICT UPDATE would be consistent with either an INSERT or an UPDATE, depending on whether the row existed beforehand, which is easier to explain than having some special UPSERT semantics. Yeah. We won't escape the question so easily where triggers are concerned, but at least for RLS it seems like it should be possible to avoid confusing, one-off semantics. Triggers are another question.. One approach that I don't recall seeing (and I'm not convinced that it's a good idea either, but thought it deserved mention) is the idea of having an UPSERT-specific set of policies (and perhaps an UPSERT-specific trigger...?). That gets pretty grotty when you consider existing systems which have INSERT and UPDATE triggers already, but UPSERT is a pretty big deal and a big change and for users who are just starting to use it, requiring that they write triggers and policies specifically to address that case might be acceptable. That doesn't address the cases where users have direct SQL access and might be able to then bypass existing triggers, but we might be able to work out an answer to that case.. Other databases have this capability and have triggers and at least one ends up firing both INSERT and UPDATE triggers, with many complaints from users about how that ends up making the performance suck. Perhaps we could use that as a fallback but support the explicit single trigger option too.. Just some thoughts, apologies if it's already been convered in depth previously. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Possible typo in create_policy.sgml
* Peter Geoghegan (p...@heroku.com) wrote: I also don't see this behavior documented (this is from process_policies()): /* * If we end up with only USING quals, then use those as * WITH CHECK quals also. */ if (with_check_quals == NIL) with_check_quals = copyObject(quals); Now, I do see a reference to it under Per-Command policies - ALL. It says: If an INSERT or UPDATE command attempts to add rows to the table which do not pass the ALL WITH CHECK (or USING, if no WITH CHECK expression is defined) expression, the command will error. But is that really the right place for it? Does it not equally well apply to FOR UPDATE policies, that can on their own have both barriers quals and WITH CHECK options? Basically, that seems to me like a *generic* property of policies (it's a generic thing that the WITH CHECK options/expressions shadow the USING security barrier quals as check options), and so should be documented as such. Ah, yes, good point, I can add more documentation around that. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
* Robert Haas (robertmh...@gmail.com) wrote: On Wed, Jan 7, 2015 at 3:01 PM, Stephen Frost sfr...@snowman.net wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Wed, Jan 7, 2015 at 4:04 AM, Dean Rasheed dean.a.rash...@gmail.com wrote: I think the policies applied should depend on the path taken, so if it does an INSERT, then only the INSERT CHECK policy should be applied (after the insert), but if it ends up doing an UPDATE, I would expect the UPDATE USING policy to be applied (before the update) and the UPDATE CHECK policy to be applied (after the update). I would not expect the INSERT CHECK policy to be applied on the UPDATE path. I agree. I can certainly understand the appeal of this approach, but I don't think it makes sense. Consider what happens later on down the road, after the code has been written and deployed using INSERT .. ON CONFLICT UPDATE where 99% of the time only one path or the other is taken. Then the other path is taken and suddenly the exact same command and row ends up returning errors. I'd say: that's life. If you don't test your policies, they might not work. I agree with that up to a point- this case feels more like a POLA violation though rather than a run-of-the-mill you need to test all that. I'm a bit worried this might lead to cases where users can't use UPSERT because they don't know which set of policies will end up getting applied, although the above approach is strictly a subset of the approach I'm advocating, but at least with the 'throw it all together' case, you know exactly what you're getting with UPSERT. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] parallel mode and parallel contexts
On 1/7/15, 9:39 AM, Robert Haas wrote: sequence.c: Is it safe to read a sequence value in a parallel worker if the cache_value is 1? No, because the sequence cache isn't synchronized between the workers. Maybe it would be safe if cache_value == 1, but there's not much use-case: how often are you going to have a read-only query that uses a sequence value. At some point we can look at making sequences parallel-safe, but worrying about it right now doesn't seem like a good use of time. Agreed, I was more concerned with calls to nextval(), which don't seem to be prevented in parallel mode? This may be a dumb question, but for functions do we know that all pl's besides C and SQL use SPI? If not I think they could end up writing in a worker. Well, the lower-level checks would catch that. But it is generally true that there's no way to prevent arbitrary C code from doing things that are unsafe in parallel mode and that we can't tell are unsafe. As I've said before, I think that we'll need to have a method of labeling functions as parallel-safe or not, but this patch isn't trying to solve that part of the problem. I was more thinking about all the add-on pl's like pl/ruby. But yeah, I don't see that there's much we can do if they're not using SPI. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: [BUGS] BUG #12320: json parsing with embedded double quotes
On 01/07/2015 08:25 AM, Aaron Botsis wrote: Hi folks, I was having a problem importing json data with COPY. Lots of things export data nicely as one json blob per line. This is excellent for directly importing into a JSON/JSONB column for analysis. ...Except when there’s an embedded doublequote. Or anything that’s escaped. COPY handles this, but by the time the escaped char hit the JSON parser, it's not escaped anymore. This breaks the JSON parsing. This means I need to manipulate the input data to double-escape it. See bug #12320 for an example. Yuck. I propose this small patch that simply allows specifying COPY … ESCAPE without requiring the CSV parser. It will make it much easier to directly use json formatted export data for folks going forward. This seemed like the simplest route. Usage is simply: postgres=# copy t1 from '/Users/nok/Desktop/queries.json'; ERROR: invalid input syntax for type json DETAIL: Token root is invalid. CONTEXT: JSON data, line 1: ...1418066241619 AND =1418671041621) AND user:root... COPY t1, line 3, column bleh: {timestamp:2014-12-15T19:17:32.505Z,duration:7.947,query:{query:{filtered:{filter:{qu... postgres=# copy t1 from '/Users/nok/Desktop/queries.json' escape ''; COPY 1966 This isn't a bug. Neither CSV format nor TEXT format are partucularly suitable for json. I'm quite certain I could compose legal json that will break your proposal (for example, with an embedded newline in the white space.) It's also unnecessary. CSV format, while not designed for this, is nevertheless sufficiently flexible to allow successful import of json data meeting certain criteria (essentially no newlines), like this: copy the_table(jsonfield) from '/path/to/jsondata' csv quote e'\x01' delimiter e'\x02'; You aren't the first person to encounter this problem. See http://adpgtech.blogspot.com/2014/09/importing-json-data.html Maybe we need to add something like this to the docs, or to the wiki. Note too my comment in that blog post: Now this solution is a bit of a hack. I wonder if there's a case for a COPY mode that simply treats each line as a single datum. I also wonder if we need some more specialized tools for importing JSON, possibly one or more Foreign Data Wrappers. Such things could handle, say, embedded newline punctuation. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible typo in create_policy.sgml
On Wed, Jan 7, 2015 at 3:06 PM, Stephen Frost sfr...@snowman.net wrote: If I'm following correctly, Peter's specifically talking about: [ USING ( replaceable class=parameterexpression/replaceable ) ] [ WITH CHECK ( replaceable class=parametercheck_expression/replaceable ) ] Where the USING parameter is 'expression' but the WITH CHECK parameter is 'check_expression'. He makes a good point, I believe, as expression is overly generic. I don't like the idea of using barrier_expression though as that ends up introducing a new term- how about using_expression? Oh. Well, I guess we could change that. I don't think it's a problem, myself. I thought he was talking about something in the SGML markup. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
On Wed, Jan 7, 2015 at 3:01 PM, Stephen Frost sfr...@snowman.net wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Wed, Jan 7, 2015 at 4:04 AM, Dean Rasheed dean.a.rash...@gmail.com wrote: I think the policies applied should depend on the path taken, so if it does an INSERT, then only the INSERT CHECK policy should be applied (after the insert), but if it ends up doing an UPDATE, I would expect the UPDATE USING policy to be applied (before the update) and the UPDATE CHECK policy to be applied (after the update). I would not expect the INSERT CHECK policy to be applied on the UPDATE path. I agree. I can certainly understand the appeal of this approach, but I don't think it makes sense. Consider what happens later on down the road, after the code has been written and deployed using INSERT .. ON CONFLICT UPDATE where 99% of the time only one path or the other is taken. Then the other path is taken and suddenly the exact same command and row ends up returning errors. I'd say: that's life. If you don't test your policies, they might not work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind in contrib
On 01/07/2015 03:22 AM, Heikki Linnakangas wrote: On 01/07/2015 01:54 AM, Andrew Dunstan wrote: I also think it's a great idea. But I think we should consider the name carefully. pg_resync might be a better name. Strictly, you might not be quite rewinding, AIUI. pg_resync sounds too generic. It's true that if the source server has changes of its own, then it's more of a sideways movement than rewinding, but I think it's nevertheless a good name. It does always rewind the control file, so that after startup, WAL replay begins from the last common point in history between the servers. WAL replay will catch up with the source server, which might be ahead of last common point, but strictly speaking pg_rewind is not involved at that point anymore. I understand, but I think pg_rewind is likely to be misleading to many users who will say but I don't want just to rewind. I'm not wedded to the name I suggested, but I think we should look at possible alternative names. We do have experience of misleading names causing confusion (e.g. initdb). cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] LSN Map
On Wed, Jan 7, 2015 at 10:50:38AM +0100, Marco Nenciarini wrote: Implementation -- We create an additional fork which contains a raw stream of LSNs. To limit the space used, every entry represent the maximum LSN of a group of blocks of a fixed size. I chose arbitrarily the size of 2048 which is equivalent to 16MB of heap data, which means that we need 64k entry to track one terabyte of heap. I like the idea of summarizing the LSN to keep its size reaonable. Have you done any measurements to determine how much backup can be skipped using this method for a typical workload, i.e. how many 16MB page ranges are not modified in a typical span between incremental backups? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] LSN Map
Bruce Momjian wrote: Have you done any measurements to determine how much backup can be skipped using this method for a typical workload, i.e. how many 16MB page ranges are not modified in a typical span between incremental backups? That seems entirely dependent on the specific workload. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel mode and parallel contexts
On Tue, Jan 6, 2015 at 9:37 PM, Jim Nasby jim.na...@bluetreble.com wrote: CreateParallelContext(): Does it actually make sense to have nworkers=0? ISTM that's a bogus case. I'm not sure whether we'll ever use the zero-worker case in production, but I've certainly found it useful for performance-testing. Also, since the number of workers will normally be determined dynamically by the planner, should that check be a regular conditional instead of just an Assert? I don't think that's really necessary. It shouldn't be too much of a stretch for the planner to come up with a non-negative value. In LaunchParallelWorkers() the Start Workers comment states that we give up registering more workers if one fails to register, but there's nothing in the if condition to do that, and I don't see RegisterDynamicBackgroundWorker() doing it either. Is the comment just incorrect? Woops, that got changed at some point and I forgot to update the comment. Will fix. SerializeTransactionState(): instead of looping through the transaction stack to calculate nxids, couldn't we just set it to maxsize - sizeof(TransactionId) * 3? If we're looping a second time for safety a comment explaining that would be useful... Yeah, I guess we could do that. I don't think it really matters very much one way or the other. sequence.c: Is it safe to read a sequence value in a parallel worker if the cache_value is 1? No, because the sequence cache isn't synchronized between the workers. Maybe it would be safe if cache_value == 1, but there's not much use-case: how often are you going to have a read-only query that uses a sequence value. At some point we can look at making sequences parallel-safe, but worrying about it right now doesn't seem like a good use of time. This may be a dumb question, but for functions do we know that all pl's besides C and SQL use SPI? If not I think they could end up writing in a worker. Well, the lower-level checks would catch that. But it is generally true that there's no way to prevent arbitrary C code from doing things that are unsafe in parallel mode and that we can't tell are unsafe. As I've said before, I think that we'll need to have a method of labeling functions as parallel-safe or not, but this patch isn't trying to solve that part of the problem. @@ -2968,7 +2969,8 @@ ProcessInterrupts(void) errmsg(canceling statement due to user request))); } } - /* If we get here, do nothing (probably, QueryCancelPending was reset) */ + if (ParallelMessagePending) + HandleParallelMessageInterrupt(false); ISTM it'd be good to leave that comment in place (after the if). RestoreComboCIDState(): Assert(!comboCids || !comboHash): Shouldn't that be ? AIUI both should always be either set or 0. Fixed. Typo: + elog(ERROR, cannot update SecondarySnapshpt during a parallel operation); Fixed. The comment for RestoreSnapshot refers to set_transaction_snapshot, but that doesn't actually exist (or isn't referenced). Fixed. Will post a new version in a bit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] LSN Map
On Wed, Jan 7, 2015 at 12:33:20PM -0300, Alvaro Herrera wrote: Bruce Momjian wrote: Have you done any measurements to determine how much backup can be skipped using this method for a typical workload, i.e. how many 16MB page ranges are not modified in a typical span between incremental backups? That seems entirely dependent on the specific workload. Well, obviously. Is that worth even stating? My question is whether there are enough workloads for this to be generally useful, particularly considering the recording granularity, hint bits, and freezing. Do we have cases where 16MB granularity helps compared to file or table-level granularity? How would we even measure the benefits? How would the administrator know they are benefitting from incremental backups vs complete backups, considering the complexity of incremental restores? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fillfactor for GIN indexes
On Wed, Jan 7, 2015 at 4:11 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Wed, Dec 3, 2014 at 2:37 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Nov 28, 2014 at 4:27 AM, Alexander Korotkov aekorot...@gmail.com wrote: On Fri, Nov 21, 2014 at 8:12 AM, Michael Paquier michael.paqu...@gmail.com wrote: Please find attached a simple patch adding fillfactor as storage parameter for GIN indexes. The default value is the same as the one currently aka 100 to have the pages completely packed when a GIN index is created. That's not true. Let us discuss it a little bit. [blah discussion] That's quite a nice explanation. Thanks! My summary is following: 1) In order to have fully correct support of fillfactor in GIN we need to rewrite GIN build algorithm. 2) Without rewriting GIN build algorithm, not much can be done with entry tree. However, you can implement some heuristics. TBH, I am not really planning to rewrite the whole code. 3) You definitely need to touch code that selects ratio of split in dataPlaceToPageLeaf (starting with if (!btree-isBuild)). OK I see, so for a split we need to have a calculation based on the fillfactor, with 75% by default. 4) GIN data pages are always compressed excepts pg_upgraded indexes from pre-9.4. Take care about it in following code. if (GinPageIsCompressed(page)) freespace = GinDataLeafPageGetFreeSpace(page); + else if (btree-isBuild) + freespace = BLCKSZ * (100 - fillfactor) / 100; Hm. Simply reversing both conditions is fine, no? This is a very interesting explanation; thanks for writing it up! It does leave me wondering why anyone would want fillfactor for GIN at all, even if they were willing to rewrite the build algorithm. Based on the explanation of Alexander, the current GIN code fills in a page at 75% for a split, and was doing even 50/50 pre-9.4 if I recall correctly. IMO a higher fillfactor makes sense for a GIN index that gets less random updates, no? I am attaching an updated patch, with the default fillfactor value at 75%, and with the page split code using the fillfactor rate. Thoughts? Rewritten version of patch is attached. I made following changes: 1) I removed fillfactor handling from entry tree. Because in this case fillfactor parameter would be only upper bound for actual page fullness. It's very like GiST approach to fillfactor. But I would rather remove fillfactor from GiST than spread such approach to other AMs. 2) Fillfactor handling for posting trees is fixed. 3) Default fillfactor for GIN is reverted to 90. I didn't mean that default fillfactor should be 75%. I mean that equilibrium state of tree would be about 75% fullness. But that doesn't mean that we don't want indexes to be better packed just after creation. Anyway I didn't see reason why default fillfactor for GIN btree should differs from default fillfactor of regular btree. -- With best regards, Alexander Korotkov. gin_fillfactor_3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fillfactor for GIN indexes
On Thu, Jan 8, 2015 at 12:31 AM, Alexander Korotkov aekorot...@gmail.com wrote: Rewritten version of patch is attached. I made following changes: 1) I removed fillfactor handling from entry tree. Because in this case fillfactor parameter would be only upper bound for actual page fullness. It's very like GiST approach to fillfactor. But I would rather remove fillfactor from GiST than spread such approach to other AMs. 2) Fillfactor handling for posting trees is fixed. 3) Default fillfactor for GIN is reverted to 90. I didn't mean that default fillfactor should be 75%. I mean that equilibrium state of tree would be about 75% fullness. But that doesn't mean that we don't want indexes to be better packed just after creation. Anyway I didn't see reason why default fillfactor for GIN btree should differs from default fillfactor of regular btree. Just forgot simple case that I used to test the patch. create table test as (select array[1,2,3] v from generate_series(1,100)); create index test_idx20 on test using gin(v) with (fillfactor=20); create index test_idx30 on test using gin(v) with (fillfactor=30); create index test_idx40 on test using gin(v) with (fillfactor=40); create index test_idx50 on test using gin(v) with (fillfactor=50); create index test_idx60 on test using gin(v) with (fillfactor=60); create index test_idx70 on test using gin(v) with (fillfactor=70); create index test_idx80 on test using gin(v) with (fillfactor=80); create index test_idx90 on test using gin(v) with (fillfactor=90); create index test_idx100 on test using gin(v) with (fillfactor=100); List of relations Schema |Name | Type | Owner | Table | Size | Description +-+---++---+-+- public | test_idx20 | index | smagen | test | 16 MB | public | test_idx30 | index | smagen | test | 11 MB | public | test_idx40 | index | smagen | test | 8152 kB | public | test_idx50 | index | smagen | test | 6520 kB | public | test_idx60 | index | smagen | test | 5176 kB | public | test_idx70 | index | smagen | test | 4480 kB | public | test_idx80 | index | smagen | test | 3928 kB | public | test_idx90 | index | smagen | test | 3520 kB | public | test_idx100 | index | smagen | test | 3184 kB | (9 rows) -- With best regards, Alexander Korotkov.
Re: [HACKERS] KNN-GiST with recheck
On Tue, Dec 16, 2014 at 4:37 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Patch attached. It should be applied on top of my pairing heap patch at http://www.postgresql.org/message-id/548ffa2c.7060...@vmware.com. Some caveats: * The signature of the distance function is unchanged, it doesn't get a recheck argument. It is just assumed that if the consistent function sets the recheck flag, then the distance needs to be rechecked as well. We might want to add the recheck argument, like you Alexander did in your patch, but it's not important right now. I didn't get how that expected to work if we have only order by qual without filter qual. In this case consistent function just isn't called at all. * I used the distance term in the executor, although the ORDER BY expr machinery is more general than that. The value returned by the ORDER BY expression doesn't have to be a distance, although that's the only thing supported by GiST and the built-in opclasses. * I short-circuited the planner to assume that the ORDER BY expression always returns a float. That's true today for knn-GiST, but is obviously a bogus assumption in general. This needs some work to get into a committable state, but from a modularity point of view, this is much better than having the indexam to peek into the heap. Nice idea to put reordering into index scan node. Doesn't look like much of overengineering. I'm going to bring it to more commitable state. -- With best regards, Alexander Korotkov.
Re: [HACKERS] Turning recovery.conf into GUCs
On 1/6/15 7:33 PM, Josh Berkus wrote: D. Wierd name: for those doing only replication, not PITR, recovery.conf is completely baffling. I don't disagree, but standby.enabled or whatever isn't any better, for the inverse reason. But replication and PITR are the same thing, so any name is going to have that problem. One way out of that would be to develop higher-level abstractions, like pg_ctl start -m standby or something, similar to how pg_ctl promote is an abstraction and got people away from fiddling with trigger files. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VODKA?
On Jan 6, 2015 7:14 PM, Josh Berkus j...@agliodbs.com wrote: Oleg, Teodor: I take it VODKA is sliding to version 9.6? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers This is kinda off, but I was wondering if anyone ever considered running a crowd-funding campaign for this sort of feature (aka potentially very popular).
Re: [HACKERS] parallel mode and parallel contexts
On 7 January 2015 at 13:11, Robert Haas robertmh...@gmail.com wrote: On Tue, Jan 6, 2015 at 4:37 PM, Simon Riggs si...@2ndquadrant.com wrote: So when you say Only the top frame of the transaction state stack is copied you don't mean the top, you mean the bottom (the latest subxact)? Which then becomes the top in the parallel worker? OK... The item most recently added to the stack is properly called the top, but I guess it's confusing in this case because the item on the bottom of the stack is referred to as the TopTransaction. I'll see if I can rephrase that. Yes, I didn't mean to suggest the confusion was introduced by you - it's just you stepped into the mess by correctly using the word top in a place where its meaning would be opposite to the close-by usage of TopTransaction. Anyway, feeling good about this now. Thanks for your patience. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] LSN Map
Alvaro Herrera alvhe...@2ndquadrant.com writes: Bruce Momjian wrote: Have you done any measurements to determine how much backup can be skipped using this method for a typical workload, i.e. how many 16MB page ranges are not modified in a typical span between incremental backups? That seems entirely dependent on the specific workload. Maybe, but it's a reasonable question. The benefit obtained from the added complexity/overhead clearly goes to zero if you summarize too much, and it's not at all clear that there's a sweet spot where you win. So I'd want to see some measurements demonstrating that this is worthwhile. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_rewind in contrib
What is this define good for? I couldn't understand where it fits; is it just a leftover? +#define pageinfo_set_truncation(forkno, rnode, blkno) datapagemap_set_truncation(pagemap, forkno, rnode, blkno) Please don't name files with generic names such as util.c/h. It's annoying later; for instance when I want to open analyze.c I have to choose the one in parser or commands. (pg_upgrade in particular is already a mess; let's not copy that.) Is the final destiny of this still contrib? There are several votes for putting it in src/bin/ right from the start, which sounds reasonable to me as well. We didn't put pg_basebackup in contrib initially for instance. If we do that, we will need more attention to translatability markers of user-visible error messages; if you want to continue using fprintf() then you will need _() around the literals, but it might be wise to use another function instead so that you can avoid them (say, have something like pg_upgrade's pg_fatal() which you can then list in nls.mk). ... Uh, I notice that you have _() in some places such as slurpFile(). I agree with Andres that it would be better to avoid a second copy of the xlogreader helper stuff. A #FRONTEND define in xlogreader.c seems acceptable, and pg_rewind can have a wrapper function to add extra functionality later, if necessary -- or we can add a flags argument, currently unused, which could later be extended to indicate to read from archive, or something. Copyright lines need updated to 2015 (meh) Our policy on header inclusion says that postgres_fe.h comes first, then system includes, then other postgres headers. So at least this one is wrong: +++ b/contrib/pg_rewind/copy_fetch.c @@ -0,0 +1,586 @@ [...] +#include postgres_fe.h + +#include catalog/catalog.h + +#include sys/types.h +#include sys/stat.h +#include dirent.h +#include fcntl.h +#include unistd.h +#include string.h + +#include pg_rewind.h +#include fetch.h +#include filemap.h +#include datapagemap.h +#include util.h The reason for this IIRC is that system includes could modify behavior of some calls in obscure platforms under obscure circumstances. + while ((xlde = readdir(xldir)) != NULL) + { You need to reset errno here, see commit 6f03927fce038096f53ca67eeab9a. +static int dstfd = -1; +static char dstpath[MAXPGPATH] = ; + +void +open_target_file(const char *path, bool trunc) +{ I think these file-level static vars ought to be declared at top of file, probably with more distinctive names. +void +close_target_file(void) +{ + if (close(dstfd) != 0) + { + fprintf(stderr, error closing destination file \%s\: %s\n, + dstpath, strerror(errno)); + exit(1); + } + + dstfd = -1; + /* fsync? */ +} Did you find an answer for the above question? +static bool +endswith(const char *haystack, const char *needle) +{ + int needlelen = strlen(needle); + int haystacklen = strlen(haystack); + + if (haystacklen needlelen) + return false; + + return strcmp(haystack[haystacklen - needlelen], needle) == 0; +} We already have this function elsewhere. + if (type != FILE_TYPE_REGULAR isRelDataFile(path)) + { + fprintf(stderr, data file in source \%s\ is a directory.\n, path); + exit(1); + } Here you have error messages ending with periods; but not in most other places. +/* + * Does it look like a relation data file? + */ +static bool +isRelDataFile(const char *path) This doesn't consider forks ... why not? If correct, probably it deserves a comment. +struct filemap_t +{ + /* +* New entries are accumulated to a linked list, in process_remote_file +* and process_local_file. +*/ + file_entry_t *first; + file_entry_t *last; + int nlist; Uh, this is like the seventh open-coded list implementation in frontend code. Can't we have this using ilist for a change? Existing practice is that SQL keywords are uppercase: + sql = + -- Create a recursive directory listing of the whole data directory\n + with recursive files (path, filename, size, isdir) as (\n + select '' as path, filename, size, isdir from\n + (select pg_ls_dir('.') as filename) as fn,\n + pg_stat_file(fn.filename) as this\n + union all\n + select parent.path || parent.filename || '/' as path,\n +fn, this.size, this.isdir\n + from files as parent,\n + pg_ls_dir(parent.path || parent.filename) as fn,\n + pg_stat_file(parent.path || parent.filename || '/' || fn) as this\n + where parent.isdir = 't'\n + )\n + -- Using the cte, fetch a listing of the all the files.\n + --\n + -- For
Re: [HACKERS] Transactions involving multiple postgres foreign servers
Ashutosh Bapat ashutosh.ba...@enterprisedb.com wrote: I don't see why would my patch cause inconsistencies. It can cause dangling PREPAREd transactions and I have already acknowledged that fact. Am I missing something? To me that is the big problem. Where I have run into ad hoc distributed transaction managers it has usually been because a crash left prepared transactions dangling, without cleaning them up when the transaction manager was restarted. This tends to wreak havoc one way or another. If we are going to include a distributed transaction manager with PostgreSQL, it *must* persist enough information about the transaction ID and where it is used in a way that will survive a subsequent crash before beginning the PREPARE on any of the systems. After all nodes are PREPAREd it must flag that persisted data to indicate that it is now at a point where ROLLBACK is no longer an option. Only then can it start committing the prepared transactions. After the last node is committed it can clear this information. On start-up the distributed transaction manager must check for any distributed transactions left in progress and commit or rollback based on the preceding; doing retries indefinitely until it succeeds or is told to stop. Doing this incompletely (i.e., not identifying and correctly handling the various failure modes) is IMO far worse than not attempting it. If we could build in something that did this completely and well, that would be a cool selling point; but let's not gloss over the difficulties. We must recognize how big a problem it would be to include a low-quality implementation. Also, as previously mentioned, it must behave in some reasonable way if a database is not configured to support 2PC, especially since 2PC is off by default in PostgreSQL. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] explain sortorder
Hi, we have spent the last days to realize your suggestions in the patch. It affects the result of a EXPLAIN-Statement (even in non-verbose-mode). Now you will get the order-information for every single sort-key which is not ordered by the defaults. best regards, Marius --- Marius Timmer Zentrum für Informationsverarbeitung Westfälische Wilhelms-Universität Münster Einsteinstraße 60 mtimm...@uni-muenster.de explain_sortorder-v6.patch Description: explain_sortorder-v6.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Patch: [BUGS] BUG #12320: json parsing with embedded double quotes
Hi folks, I was having a problem importing json data with COPY. Lots of things export data nicely as one json blob per line. This is excellent for directly importing into a JSON/JSONB column for analysisExcept when there’s an embedded doublequote. Or anything that’s escaped. COPY handles this, but by the time the escaped char hit the JSON parser, it's not escaped anymore. This breaks the JSON parsing. This means I need to manipulate the input data to double-escape it. See bug #12320 for an example. Yuck.I propose this small patch that simply allows specifying COPY … ESCAPE without requiring the CSV parser. It will make it much easier to directly use json formatted export data for folks going forward. This seemed like the simplest route.Usage is simply:postgres=# copy t1 from '/Users/nok/Desktop/queries.json';ERROR: invalid input syntax for type jsonDETAIL: Token "root" is invalid.CONTEXT: JSON data, line 1: ...1418066241619 AND =1418671041621) AND user:"root...COPY t1, line 3, column bleh: "{"timestamp":"2014-12-15T19:17:32.505Z","duration":7.947,"query":{"query":{"filtered":{"filter":{"qu..."postgres=# copy t1 from '/Users/nok/Desktop/queries.json' escape '';COPY 1966I’ve included regression tests, and all existing tests pass. This is my first contribution, so be kind to me. :) escape-without-csv.patch Description: Binary data Begin forwarded message:From: Francisco Olarte fola...@peoplecall.comDate: January 6, 2015 at 1:52:28 PM ESTSubject: Re: [BUGS] BUG #12320: json parsing with embedded double quotesTo: Aaron Botsis aa...@bt-r.comCc: postg...@bt-r.com, "pgsql-b...@postgresql.org" pgsql-b...@postgresql.orgHi Aaron:On Tue, Jan 6, 2015 at 7:06 PM, Aaron Botsis aa...@bt-r.com wrote:Hi Francisco, I’m aware, but still consider this to be a bug, or at least a great opportunity for an enhancement. :)Maybe, but you are going to have a problem.This had bitten me for the third time while trying to import some json data. It’d be great to bypass the copy escaping (and possibly other meta characters) when the column type is json or jsonb. I’d be happy to try and write it and submit a patch if folks believe this is an acceptable way to go… That said, I should probably read what the process is for this kind of thing :)Reading this, you are talking about 'the column being json'. COPY needs to do the escaping at the same time it's constructing the columns. The present way is easy to do, read char by char, if it's a escape, process next char acumulating into current field, otherwise see whether it is a field or record separator and act accordingly. It's also layered, when you construct the records for copy you get all the field data, turn them into escaped strings, join them by the field separator and spit them out followed by a record separator ( practical implementations may do this virtually ). Intermixing this with the 'I'm in a json column' would need to pass information from the upper layer, and make it more difficult and, specially error prone. What do you do if ( using the standard delimiters ) your json value has embeded newlines and tabs ( which, IIRC, are legal in several places inside the json ). And all this to make some incorrectly formatted files read ( which can be correctly formatted with a perl one liner or something similar ). I'm not the one to decide, but I will vote against including that ( but do not trust me too much, I would also vote against including 'csv' which I consider the root of many evils ).Francisco Olarte.
Re: [HACKERS] Patch: [BUGS] BUG #12320: json parsing with embedded double quotes
Ack. Original had a bug. Let’s try again. escape-without-csv-v2.patch Description: Binary data On Jan 7, 2015, at 8:25 AM, Aaron Botsis aa...@bt-r.com wrote:Hi folks, I was having a problem importing json data with COPY. Lots of things export data nicely as one json blob per line. This is excellent for directly importing into a JSON/JSONB column for analysisExcept when there’s an embedded doublequote. Or anything that’s escaped. COPY handles this, but by the time the escaped char hit the JSON parser, it's not escaped anymore. This breaks the JSON parsing. This means I need to manipulate the input data to double-escape it. See bug #12320 for an example. Yuck.I propose this small patch that simply allows specifying COPY … ESCAPE without requiring the CSV parser. It will make it much easier to directly use json formatted export data for folks going forward. This seemed like the simplest route.Usage is simply:postgres=# copy t1 from '/Users/nok/Desktop/queries.json';ERROR: invalid input syntax for type jsonDETAIL: Token "root" is invalid.CONTEXT: JSON data, line 1: ...1418066241619 AND =1418671041621) AND user:"root...COPY t1, line 3, column bleh: "{"timestamp":"2014-12-15T19:17:32.505Z","duration":7.947,"query":{"query":{"filtered":{"filter":{"qu..."postgres=# copy t1 from '/Users/nok/Desktop/queries.json' escape '';COPY 1966I’ve included regression tests, and all existing tests pass. This is my first contribution, so be kind to me. :)escape-without-csv.patchBegin forwarded message:From: Francisco Olarte fola...@peoplecall.comDate: January 6, 2015 at 1:52:28 PM ESTSubject: Re: [BUGS] BUG #12320: json parsing with embedded double quotesTo: Aaron Botsis aa...@bt-r.comCc: postg...@bt-r.com, "pgsql-b...@postgresql.org" pgsql-b...@postgresql.orgHi Aaron:On Tue, Jan 6, 2015 at 7:06 PM, Aaron Botsis aa...@bt-r.com wrote:Hi Francisco, I’m aware, but still consider this to be a bug, or at least a great opportunity for an enhancement. :)Maybe, but you are going to have a problem.This had bitten me for the third time while trying to import some json data. It’d be great to bypass the copy escaping (and possibly other meta characters) when the column type is json or jsonb. I’d be happy to try and write it and submit a patch if folks believe this is an acceptable way to go… That said, I should probably read what the process is for this kind of thing :)Reading this, you are talking about 'the column being json'. COPY needs to do the escaping at the same time it's constructing the columns. The present way is easy to do, read char by char, if it's a escape, process next char acumulating into current field, otherwise see whether it is a field or record separator and act accordingly. It's also layered, when you construct the records for copy you get all the field data, turn them into escaped strings, join them by the field separator and spit them out followed by a record separator ( practical implementations may do this virtually ). Intermixing this with the 'I'm in a json column' would need to pass information from the upper layer, and make it more difficult and, specially error prone. What do you do if ( using the standard delimiters ) your json value has embeded newlines and tabs ( which, IIRC, are legal in several places inside the json ). And all this to make some incorrectly formatted files read ( which can be correctly formatted with a perl one liner or something similar ). I'm not the one to decide, but I will vote against including that ( but do not trust me too much, I would also vote against including 'csv' which I consider the root of many evils ).Francisco Olarte.
Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg
Hi, On 23.12.2014 10:16, Jeff Davis wrote: It seems that these two patches are being reviewed together. Should I just combine them into one? My understanding was that some wanted to review the memory accounting patch separately. On Sun, 2014-12-21 at 20:19 +0100, Tomas Vondra wrote: That's the only conflict, and after fixing it it compiles OK. However, I got a segfault on the very first query I tried :-( If lookup_hash_entry doesn't find the group, and there's not enough memory to create it, then it returns NULL; but the caller wasn't checking for NULL. My apologies for such a trivial mistake, I was doing most of my testing using DISTINCT. My fix here was done quickly, so I'll take a closer look later to make sure I didn't miss something else. New patch attached (rebased, as well). I did a review today, using these two patches: * memory-accounting-v9.patch (submitted on December 2) * hashagg-disk-20141222.patch I started with some basic performance measurements comparing hashagg queries without and with the patches (while you compared hashagg and sort). That's IMHO an interesting comparison, especially when no batching is necessary - in the optimal case the users should not see any slowdown (we shouldn't make them pay for the batching unless it actually is necessary). So I did this: drop table if exists test_hash_agg; create table test_hash_agg as select i AS a, mod(i,100) AS b, mod(i,10) AS c, mod(i,1) AS d, mod(i,1000) AS e, mod(i,100) AS f from generate_series(1,1000) s(i); vacuum (full, analyze) test_hash_agg; i.e. a ~500MB table with 10M rows, and columns with different cardinalities. And then queries like this: select count(*) from (select a, count(a) from test_hash_agg group by a) foo; -- 10M groups (OOM) select count(*) from (select a, array_agg(a) from test_hash_agg group by a) foo; -- 100 groups select count(*) from (select f, array_agg(f) from test_hash_agg group by f) foo; which performed quite well, i.e. I've seen absolutely no slowdown. Which, in the array_agg case, is quite is quite suspicious, because on every lookup_hash_entry call, it has to do MemoryContextMemAllocated() on 10M contexts, and I really doubt that can be done in ~0 time. So I started digging in the code and I noticed this: hash_mem = MemoryContextMemAllocated(aggstate-hashcontext, true); which is IMHO obviously wrong, because that accounts only for the hashtable itself. It might be correct for aggregates with state passed by value, but it's incorrect for state passed by reference (e.g. Numeric, arrays etc.), because initialize_aggregates does this: oldContext = MemoryContextSwitchTo(aggstate-aggcontext); pergroupstate-transValue = datumCopy(peraggstate-initValue, peraggstate-transtypeByVal, peraggstate-transtypeLen); MemoryContextSwitchTo(oldContext); and it's also wrong for all the user-defined aggretates that have no access to hashcontext at all, and only get reference to aggcontext using AggCheckCallContext(). array_agg() is a prime example. In those cases the patch actually does no memory accounting and as hashcontext has no child contexts, there's no accounting overhead. After fixing this bug (replacing hashcontext with aggcontext in both calls to MemoryContextMemAllocated) the performance drops dramatically. For the query with 100 groups (not requiring any batching) I see this: test=# explain analyze select count(x) from (select f, array_agg(1) AS x from test_hash_agg group by f) foo; QUERY PLAN (original patch) Aggregate (cost=213695.57..213695.58 rows=1 width=32) (actual time=2539.156..2539.156 rows=1 loops=1) - HashAggregate (cost=213693.07..213694.32 rows=100 width=4) (actual time=2492.264..2539.012 rows=100 loops=1) Group Key: test_hash_agg.f Batches: 1 Memory Usage: 24kB Disk Usage:0kB - Seq Scan on test_hash_agg (cost=0.00..163693.71 rows=871 width=4) (actual time=0.022..547.379 rows=1000 loops=1) Planning time: 0.039 ms Execution time: 2542.932 ms (7 rows) QUERY PLAN (fixed patch) Aggregate (cost=213695.57..213695.58 rows=1 width=32) (actual time=5670.885..5670.885 rows=1 loops=1) - HashAggregate (cost=213693.07..213694.32 rows=100 width=4) (actual time=5623.254..5670.803 rows=100 loops=1) Group Key: test_hash_agg.f Batches: 1 Memory Usage: 117642kB Disk Usage:0kB - Seq Scan on
Re: [HACKERS] [PATCH] explain sortorder
V6 of this patch applies, builds and checks against the current HEAD. The areas below could use some attention. In explain.c: malloc() should not be called directly here. palloc() would be the correct call, I believe, but the functions in stringinfo.h are probably your best choice as they remove the necessity for dealing with buffer size and overflow. There is leftover commented out code from the previous patch version in the T_Sort case. In show_sort_group_keys(), the splitting of the existing declaration and initialization of the keyresno and target seems unnecessary and against the style of surrounding code. Multi-line comments should follow the existing format. There are no tests for the ... is LC_COLLATE and COLLATE... cases. Section 14.1 of the documentation may need to be updated. Mike. __ *Mike Blackwell | Technical Analyst, Distribution Services/Rollout Management | RR Donnelley* 1750 Wallace Ave | St Charles, IL 60174-3401 Office: 630.313.7818 mike.blackw...@rrd.com http://www.rrdonnelley.com http://www.rrdonnelley.com/ * mike.blackw...@rrd.com* On Wed, Jan 7, 2015 at 10:17 AM, Timmer, Marius marius.tim...@uni-muenster.de wrote: Hi, we have spent the last days to realize your suggestions in the patch. It affects the result of a EXPLAIN-Statement (even in non-verbose-mode). Now you will get the order-information for every single sort-key which is not ordered by the defaults. best regards, Marius --- Marius Timmer Zentrum für Informationsverarbeitung Westfälische Wilhelms-Universität Münster Einsteinstraße 60 mtimm...@uni-muenster.de
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
* Dean Rasheed (dean.a.rash...@gmail.com) wrote: On 6 January 2015 at 20:44, Peter Geoghegan p...@heroku.com wrote: Another issue that I see is that there is only one resultRelation between the INSERT and UPDATE. That means that without some additional special measure, both UPDATE and INSERT WITH CHECK ( ... ) options are applied regardless of whether the INSERT path was taken, or the UPDATE path. Maybe that's actually sensible (note that even this doesn't happen right now, since the auxiliary UPDATE is currently minimally processed by the rewriter). What do people think about that? It seems like it might be less surprising to have that fail earlier when there are separate policies for INSERT and UPDATE -- for UPSERT, all policies are applied regardless of which path is taken. I think the policies applied should depend on the path taken, so if it does an INSERT, then only the INSERT CHECK policy should be applied (after the insert), but if it ends up doing an UPDATE, I would expect the UPDATE USING policy to be applied (before the update) and the UPDATE CHECK policy to be applied (after the update). I would not expect the INSERT CHECK policy to be applied on the UPDATE path. You're making a distinction where I'm not convinced there should be one. While I understand that, internally, there are two paths (INSERT vs. UPDATE), it's still only one command for the user and their goal is simply update if this exists, insert if it doesn't. I also don't like the idea that the policy to be applied depends on if it ends up being an INSERT or an UPDATE. I liked Peter's suggestion that the row must pass both WITH CHECK conditions- at least that's consistent and understandable. As to whether the UPDATE USING policy should cause an error to be thrown if it is not satisfied, my inclination would be to not error, and use the command tag to report that no rows were updated, since that is what would happen with a regular UPDATE. Yes, for a regular UPDATE that's what would happen- but this isn't a regular UPDATE, it's an UPSERT. Consider how users handle this now- they have routines which continually try to do one or the other until either the INSERT or the UPDATE is successful. I've never seen such a function, properly written, that says try to INSERT, if that fails, try to UPDATE and if that doesn't update anything, then simply exit. What is the use-case for that? So overall INSERT .. ON CONFLICT UPDATE would be consistent with either an INSERT or an UPDATE, depending on whether the row existed beforehand, which is easier to explain than having some special UPSERT semantics. Any semantics which result in a no-op would be surprising for users of UPSERT. I like the idea of failing earlier- if you try to INSERT .. ON CONFLICT UPDATE a row which you're not allowed to INSERT, erroring strikes me as the right result. If you try to INSERT .. ON CONFLICT UPDATE a row which you're not allowed to UPDATE, I'd also think an error would be the right answer, even if you don't end up doing an actual UPDATE- you ran a command asking for an UPDATE to happen under a certain condition (the row already exists) and the policy states that you're not allowed to do such an UPDATE. As for the UPDATE's USING clause, if you're not allowed to view the records to be updated (which evidently exists because the INSERT failed), I'd handle that the same way we handle the case that a SELECT policy might not allow a user to see rows that they can attempt to INSERT- they'll get an error back in that case too. In the end, these are edge cases as policies are generally self consistent and so I think we have some leeway, but I don't think we have the right to turn an INSERT .. ON CONFLICT UPDATE into a no-op. That would be like allowing an INSERT to be a no-op. Thanks, Stephen signature.asc Description: Digital signature
[HACKERS] Re: [COMMITTERS] pgsql: Change how first WAL segment on new timeline after promotion is
On 01/06/2015 03:22 PM, Andres Freund wrote: On 2015-01-05 18:45:22 +0200, Heikki Linnakangas wrote: On 01/03/2015 08:59 PM, Andres Freund wrote: Did you perhaps intend to use XLogFileInit(use_existing = true) instead of XLogFileOpen()? That works for me. Hmm, that doesn't sound right either. XLogFileInit is used when you switch to a new segment, not to open an old segment for writing. It happens to work, because with use_existing = true it will in fact always open the old segment, instead of creating a new one, but I don't think that's in the spirit of how that function's intended to be used. Well, its docs say Create a new XLOG file segment, or open a pre-existing one., so it's not that mismatched. We really don't know whether the EndOfLog's segment already exist in this scenario. Also, doesn't XLogWrite() essentially use it in the same way? XLogWrite() is smarter. It uses XLogFileInit() when switching to a new segment, and XLogFileOpen when writing to the middle of a segment. Committed the fix to not open the segment at all. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Turning recovery.conf into GUCs
Andres Freund and...@2ndquadrant.com wrote: On 2015-01-06 17:08:20 -0800, Josh Berkus wrote: On 01/06/2015 04:42 PM, Andres Freund wrote: On 2015-01-06 16:33:36 -0800, Josh Berkus wrote: F. Inability to remaster without restarting the replica. That has pretty much nothing to do with the topic at hand. It has *everything* to do with the topic at hand. The *only* reason we can't remaster without restarting is that recovery.conf is only read at startup time, and can't be reloaded. http://www.databasesoup.com/2014/05/remastering-without-restarting.html Not really. It's a good way to introduce suble and hard to understand corruption and other strange corner cases. Your replication connection was lost/reconnected in the wrong moment? Oops, received/replayed too much. A real solution to this requires more. You need to 1) disable writing any wal 2) force catchup of all possible standbys, including apply. Most importantly of the new master. This requires knowing which standbys exist. 3) promote new master. 4) only now allow reconnects. I'm not following. As long as each standby knows what point it is at in the transaction stream, it could make a request to any transaction source for transactions past that point. If a node which will be promoted to the new master isn't caught up to that point, it should not send anything. When it does get caught up it could start sending transactions past that point, including the timeline switch when it is promoted. If you were arguing that some changes besides *just* allowing a standby to read from a new source without a restart, OK -- some changes might also be needed for a promoted master to behave as described above; but certainly the ability to read WAL from a new source on the floor would be a prerequisite, and possibly the biggest hurdle to clear. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS
On Wed, Jan 7, 2015 at 4:04 AM, Dean Rasheed dean.a.rash...@gmail.com wrote: I think the policies applied should depend on the path taken, so if it does an INSERT, then only the INSERT CHECK policy should be applied (after the insert), but if it ends up doing an UPDATE, I would expect the UPDATE USING policy to be applied (before the update) and the UPDATE CHECK policy to be applied (after the update). I would not expect the INSERT CHECK policy to be applied on the UPDATE path. I agree. As to whether the UPDATE USING policy should cause an error to be thrown if it is not satisfied, my inclination would be to not error, and use the command tag to report that no rows were updated, since that is what would happen with a regular UPDATE. My inclination would be to error, but I'd be OK with your proposal. So overall INSERT .. ON CONFLICT UPDATE would be consistent with either an INSERT or an UPDATE, depending on whether the row existed beforehand, which is easier to explain than having some special UPSERT semantics. Yeah. We won't escape the question so easily where triggers are concerned, but at least for RLS it seems like it should be possible to avoid confusing, one-off semantics. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HINTing on UPDATE foo SET foo.bar = ..;
We're waiting for an updated version here, right? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HINTing on UPDATE foo SET foo.bar = ..;
On 1/7/15 6:22 PM, Alvaro Herrera wrote: We're waiting for an updated version here, right? Yeah. (The CF entry is also set to Waiting on Author, which seems appropriate.) .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers