[HACKERS] [RFC] LSN Map

2015-01-07 Thread Marco Nenciarini
Hi Hackers,

In order to make incremental backup
(https://wiki.postgresql.org/wiki/Incremental_backup) efficient we
need a way to track the LSN of a page in a way that we can retrieve it
without reading the actual block. Below there is my proposal on how to
achieve it.

LSN Map
---

The purpose of the LSN map is to quickly know if a page of a relation
has been modified after a specified checkpoint.

Implementation
--

We create an additional fork which contains a raw stream of LSNs. To
limit the space used, every entry represent the maximum LSN of a group
of blocks of a fixed size. I chose arbitrarily the size of 2048
which is equivalent to 16MB of heap data, which means that we need 64k
entry to track one terabyte of heap.

Name


I've called this map LSN map, and I've named the corresponding fork
file as lm.

WAL logging
---

At the moment the map is not wal logged, but is updated during the wal
reply. I'm not enough deep in WAL mechanics to see if the current
approach is sane or if we should change it.

Current limits
--

The current implementation tracks only heap LSN. It currently does not
track any kind of indexes, but this can be easily added later. The
implementation of commands that rewrite the whole table can be
improved: cluster uses shared memory buffers instead of writing the
map directly on the disk, and moving a table to another tablespace
simply drops the map instead of updating it correctly.

Further ideas
-

The current implementation updates an entry in the map every time the
block get its LSN bumped, but we really only need to know which is the
first checkpoint that contains expired data. So setting the entry to
the last checkpoint LSN is probably enough, and will reduce the number
of writes. To implement this we only need a backend local copy of the
last checkpoint LSN, which is updated during each XLogInsert. Again,
I'm not enough deep in replication mechanics to see if this approach
could work on a standby using restartpoints instead of checkpoints.
Please advice on the best way to implement it.

Conclusions


This code is incomplete, and the xlog reply part must be
improved/fixed, but I think its a good start to have this feature.
I will appreciate any review, advice or critic.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it

From 89a943032f0a10fd093c126d15fbf81e5861dbe3 Mon Sep 17 00:00:00 2001
From: Marco Nenciarini marco.nenciar...@2ndquadrant.it
Date: Mon, 3 Nov 2014 17:52:27 +0100
Subject: [PATCH] LSN Map

This is a WIP. Only heap is supported. No indexes, no sequences.
---
 src/backend/access/heap/Makefile  |   2 +-
 src/backend/access/heap/heapam.c  | 239 ++--
 src/backend/access/heap/hio.c |  11 +-
 src/backend/access/heap/lsnmap.c  | 336 ++
 src/backend/access/heap/pruneheap.c   |  10 +
 src/backend/access/heap/rewriteheap.c |  37 +++-
 src/backend/catalog/storage.c |   8 +
 src/backend/commands/tablecmds.c  |   5 +-
 src/backend/commands/vacuumlazy.c |  35 +++-
 src/backend/storage/smgr/smgr.c   |   1 +
 src/common/relpath.c  |   5 +-
 src/include/access/hio.h  |   3 +-
 src/include/access/lsnmap.h   |  28 +++
 src/include/common/relpath.h  |   5 +-
 src/include/storage/smgr.h|   1 +
 15 files changed, 687 insertions(+), 39 deletions(-)
 create mode 100644 src/backend/access/heap/lsnmap.c
 create mode 100644 src/include/access/lsnmap.h

diff --git a/src/backend/access/heap/Makefile b/src/backend/access/heap/Makefile
index b83d496..776ee7d 100644
*** a/src/backend/access/heap/Makefile
--- b/src/backend/access/heap/Makefile
*** subdir = src/backend/access/heap
*** 12,17 
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o 
visibilitymap.o
  
  include $(top_srcdir)/src/backend/common.mk
--- 12,17 
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o 
visibilitymap.o lsnmap.o
  
  include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 21e9d06..9486562 100644
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
***
*** 48,53 
--- 48,54 
  #include access/tuptoaster.h
  #include access/valid.h
  #include access/visibilitymap.h
+ #include access/lsnmap.h
  #include access/xact.h
  #include access/xlog.h
  #include access/xloginsert.h
*** heap_insert(Relation relation, HeapTuple
*** 2067,2073 
TransactionId xid = GetCurrentTransactionId();
HeapTuple   heaptup;
Buffer  buffer;
!   

Re: [HACKERS] XLOG_PARAMETER_CHANGE handling of wal_log_hints

2015-01-07 Thread Petr Jelinek

On 07/01/15 00:59, Michael Paquier wrote:

On Wed, Jan 7, 2015 at 4:24 AM, Petr Jelinek p...@2ndquadrant.com wrote:

Hi,

when I was fixing how commit_ts handles the XLOG_PARAMETER_CHANGE I noticed
that for wal_log_hints we assign the value in ControFile to current value
instead of value that comes from WAL.

ISTM it has just information value so it should not have any practical
impact, but it looks like a bug anyway so here is simple patch to change
that.

Right. That's something that should get fixed in 9.4 as well..

  ControlFile-track_commit_timestamp = track_commit_timestamp;
The new value of track_commit_timestamp should be taken as well from
xlrec on master btw.



That's part of the larger fix for CommitTs that I sent separately in 
response to the bug report from Fujii - there is much more to be done 
there for CommitTs than just this.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Incremental backup v3: incremental PoC

2015-01-07 Thread Marco Nenciarini
Il 06/01/15 14:26, Robert Haas ha scritto:
 I suggest leaving this out altogether for the first version.  I can
 think of three possible ways that we can determine which blocks need
 to be backed up.  One, just read every block in the database and look
 at the LSN of each one.  Two, maintain a cache of LSN information on a
 per-segment (or smaller) basis, as you suggest here.  Three, scan the
 WAL generated since the incremental backup and summarize it into a
 list of blocks that need to be backed up.  This last idea could either
 be done when the backup is requested, or it could be done as the WAL
 is generated and used to populate the LSN cache.  In the long run, I
 think some variant of approach #3 is likely best, but in the short
 run, approach #1 (scan everything) is certainly easiest.  While it
 doesn't optimize I/O, it still gives you the benefit of reducing the
 amount of data that needs to be transferred and stored, and that's not
 nothing.  If we get that much working, we can improve things more
 later.
 

Hi,
The patch now uses the approach #1, but I've just sent a patch that uses
the #2 approach.

54ad016e.9020...@2ndquadrant.it

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] parallel mode and parallel contexts

2015-01-07 Thread Simon Riggs
On 6 January 2015 at 21:37, Simon Riggs si...@2ndquadrant.com wrote:

 I get it now and agree

Yes, very much.

Should we copy both the top-level frame and the current subxid?
Hot Standby links subxids directly to the top-level, so this would be similar.

If we copied both, we wouldn't need to special case the Get functions.
It would still be O(1).

 but please work some more on clarity of
 README, comments and variable naming!

Something other than top sounds good.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS

2015-01-07 Thread Dean Rasheed
On 6 January 2015 at 20:44, Peter Geoghegan p...@heroku.com wrote:
 Another issue that I see is that there is only one resultRelation
 between the INSERT and UPDATE. That means that without some additional
 special measure, both UPDATE and INSERT WITH CHECK ( ... )  options
 are applied regardless of whether the INSERT path was taken, or the
 UPDATE path. Maybe that's actually sensible (note that even this
 doesn't happen right now, since the auxiliary UPDATE is currently
 minimally processed by the rewriter). What do people think about that?
 It seems like it might be less surprising to have that fail earlier
 when there are separate policies for INSERT and UPDATE -- for UPSERT,
 all policies are applied regardless of which path is taken.


I think the policies applied should depend on the path taken, so if it
does an INSERT, then only the INSERT CHECK policy should be applied
(after the insert), but if it ends up doing an UPDATE, I would expect
the UPDATE USING policy to be applied (before the update) and the
UPDATE CHECK policy to be applied (after the update). I would not
expect the INSERT CHECK policy to be applied on the UPDATE path.

As to whether the UPDATE USING policy should cause an error to be
thrown if it is not satisfied, my inclination would be to not error,
and use the command tag to report that no rows were updated, since
that is what would happen with a regular UPDATE.

So overall INSERT .. ON CONFLICT UPDATE would be consistent with
either an INSERT or an UPDATE, depending on whether the row existed
beforehand, which is easier to explain than having some special UPSERT
semantics.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Patch to add functionality to specify ORDER BY in CREATE FUNCTION for SRFs

2015-01-07 Thread Nicolas Barbier
2015-01-05 Tom Lane t...@sss.pgh.pa.us:

 What would make sense to me is to teach the planner about inlining
 SQL functions that include ORDER BY clauses, so that the performance
 issue of a double sort could be avoided entirely transparently to
 the user.

Another way of getting to the point where the extra check-node is not
needed in obvious cases, would be:

* Apply the current patch in some form.
* Later, add code that analyzes the query inside the function. If it
turns out that the result of the analysis implies the declared order,
don't add the check-node.

The analysis can in principle also be performed for other languages,
but that would most likely be way more complex for the typical Turing
complete languages.

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2015-01-07 Thread Ashutosh Bapat
On Sat, Jan 3, 2015 at 2:15 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes:
  While looking at the patch for supporting inheritance on foreign tables,
 I
  noticed that if a transaction makes changes to more than two foreign
  servers the current implementation in postgres_fdw doesn't make sure that
  either all of them rollback or all of them commit their changes, IOW
 there
  is a possibility that some of them commit their changes while others
  rollback theirs.

  PFA patch which uses 2PC to solve this problem. In pgfdw_xact_callback()
 at
  XACT_EVENT_PRE_COMMIT event, it sends prepares the transaction at all the
  foreign postgresql servers and at XACT_EVENT_COMMIT or XACT_EVENT_ABORT
  event it commits or aborts those transactions resp.

 TBH, I think this is a pretty awful idea.

 In the first place, this does little to improve the actual reliability
 of a commit occurring across multiple foreign servers; and in the second
 place it creates a bunch of brand new failure modes, many of which would
 require manual DBA cleanup.

 The core of the problem is that this doesn't have anything to do with
 2PC as it's commonly understood: for that, you need a genuine external
 transaction manager that is aware of all the servers involved in a
 transaction, and has its own persistent state (or at least a way to
 reconstruct its own state by examining the per-server states).
 This patch is not that; in particular it treats the local transaction
 asymmetrically from the remote ones, which doesn't seem like a great
 idea --- ie, the local transaction could still abort after committing
 all the remote ones, leaving you no better off in terms of cross-server
 consistency.

 As far as failure modes go, one basic reason why this cannot work as
 presented is that the remote servers may not even have prepared
 transaction support enabled (in fact max_prepared_transactions = 0
 is the default in all supported PG versions).  So this would absolutely
 have to be a not-on-by-default option.


Agreed. We can have a per foreign server option, which says whether the
corresponding server can participate in 2PC. A transaction spanning
multiple foreign server with at least one of them not capable of
participating in 2PC will need to be aborted.


 But the bigger issue is that
 leaving it to the DBA to clean up after failures is not a production
 grade solution, *especially* not for prepared transactions, which are
 performance killers if not closed out promptly.  So I can't imagine
 anyone wanting to turn this on without a more robust answer than that.


I purposefully left that outside this patch, since it involves significant
changes in core. If that's necessary for the first cut, I will work on it.



 Basically I think what you'd need for this to be a credible patch would be
 for it to work by changing the behavior only in the PREPARE TRANSACTION
 path: rather than punting as we do now, prepare the remote transactions,
 and report their server identities and gids to an external transaction
 manager, which would then be responsible for issuing the actual commits
 (along with the actual commit of the local transaction).  I have no idea
 whether it's feasible to do that without having to assume a particular
 2PC transaction manager API/implementation.


I doubt if a TM would expect a bunch of GIDs in response to PREPARE
TRANSACTION command. Per X/Open xa_prepare() expects an integer return
value, specifying whether the PREPARE succeeded or not and some piggybacked
statuses.

In the context of foreign table under inheritance tree, a single DML can
span multiple foreign servers. All such DMLs will then need to be handled
by an external TM. An external TM or application may not have exact idea as
to which all foreign servers are going to be affected by a DML. Users may
not want to setup an external TM in such cases. Instead they would expect
PostgreSQL to manage such DMLs and transactions all by itself.

As Robert has suggested in his responses, it would be better to enable
PostgreSQL to manage distributed transactions itself.


 It'd be interesting to hear from people who are using 2PC in production
 to find out if this would solve any real-world problems for them, and
 what the details of the TM interface would need to look like to make it
 work in practice.

 In short, you can't force 2PC technology on people who aren't using it
 already; while for those who are using it already, this isn't nearly
 good enough as-is.

 regards, tom lane




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Transactions involving multiple postgres foreign servers

2015-01-07 Thread Ashutosh Bapat
On Tue, Jan 6, 2015 at 11:55 PM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Jan 5, 2015 at 3:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Well, we intentionally didn't couple the FDW stuff closely into
  transaction commit, because of the thought that the far end would not
  necessarily have Postgres-like transactional behavior, and even if it did
  there would be about zero chance of having atomic commit with a
  non-Postgres remote server.  postgres_fdw is a seriously bad starting
  point as far as that goes, because it encourages one to make assumptions
  that can't possibly work for any other wrapper.

 Atomic commit is something that can potentially be supported by many
 different FDWs, as long as the thing on the other end supports 2PC.
 If you're talking to Oracle or DB2 or SQL Server, and it supports 2PC,
 then you can PREPARE the transaction and then go back and COMMIT the
 transaction once it's committed locally.

Getting a cluster-wide
 *snapshot* is probably a PostgreSQL-only thing requiring much deeper
 integration, but I think it would be sensible to leave that as a
 future project and solve the simpler problem first.

  I think the idea I sketched upthread of supporting an external
 transaction
  manager might be worth pursuing, in that it would potentially lead to
  having at least an approximation of atomic commit across heterogeneous
  servers.

 An important threshold question here is whether we want to rely on an
 external transaction manager, or build one into PostgreSQL.  As far as
 this particular project goes, there's nothing that can't be done
 inside PostgreSQL.  You need a durable registry of which transactions
 you prepared on which servers, and which XIDs they correlate to.  If
 you have that, then you can use background workers or similar to go
 retry commits or rollbacks of prepared transactions until it works,
 even if there's been a local crash meanwhile.


 Alternatively, you could rely on an external transaction manager to do
 all that stuff.  I don't have a clear sense of what that would entail,
 or how it might be better or worse than rolling our own.  I suspect,
 though, that it might amount to little more than adding a middle man.
 I mean, a third-party transaction manager isn't going to automatically
 know how to commit a transaction prepared on some foreign server using
 some foreign data wrapper.  It's going to be have to be taught that if
 postgres_fdw leaves a transaction in-medias-res on server OID 1234,
 you've got to connect to the target machine using that foreign
 server's connection parameters, speak libpq, and issue the appropriate
 COMMIT TRANSACTION command.  And similarly, you're going to need to
 arrange to notify it before preparing that transaction so that it
 knows that it needs to request the COMMIT or ABORT later on.  Once
 you've got all of that infrastructure for that in place, what are you
 really gaining over just doing it in PostgreSQL (or, say, a contrib
 module thereto)?


Thanks Robert for giving high level view of system needed for PostgreSQL to
be a transaction manager by itself. Agreed completely.



 (I'm also concerned that an external transaction manager might need
 the PostgreSQL client to be aware of it, whereas what we'd really like
 here is for the client to just speak PostgreSQL and be happy that its
 commits no longer end up half-done.)

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] pg_rewind in contrib

2015-01-07 Thread Heikki Linnakangas

On 01/07/2015 01:54 AM, Andrew Dunstan wrote:

I also think it's a great idea. But I think we should consider the name
carefully. pg_resync might be a better name. Strictly, you might not be
quite rewinding, AIUI.


pg_resync sounds too generic. It's true that if the source server has 
changes of its own, then it's more of a sideways movement than 
rewinding, but I think it's nevertheless a good name.


It does always rewind the control file, so that after startup, WAL 
replay begins from the last common point in history between the servers. 
WAL replay will catch up with the source server, which might be ahead of 
last common point, but strictly speaking pg_rewind is not involved at 
that point anymore.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind in contrib

2015-01-07 Thread Heikki Linnakangas

On 01/07/2015 02:17 AM, Andres Freund wrote:

On 2015-01-06 15:39:29 -0500, Peter Eisentraut wrote:
It wouldn't hurt if we could share SimpleXLogPageRead() between
pg_xlogdump and pg_rewind as the differences are more or less
superficial, but that seems simple enough to achieve by putting a
frontend variant in xlogreader.c/h.


It's not that much code. And although they're almost identical now, it's 
not clear that pg_rewind and pg_xlogdump would want the same behaviour 
in the future. pg_rewind might want to read files from a WAL archive, 
for example. (Not that I have any plans for that right now)



If this ends up shipping, it's going to be a massively popular tool.  I
see it as a companion to pg_basebackup.  So it should sort of work the
same way.  One problem is that it doesn't use the replication protocol,
so the setup is going to be inconsistent with pg_basebackup.  Maybe the
replication protocol could be extended to provide the required data.


I'm not particularly bothered by the requirement of also requiring a
normal, not replication, connection. In most cases that'll already be
allowed.


Yeah, it's not a big problem in practice, but it sure would be nice for 
usability if it wasn't required. One less thing to document and prepare for.



Maybe something as simple as give me this file would work.


Well, looking at libpq_fetch.c it seems there'd be a bit more required.

Not having to create a temporary table on the other side would be nice -
afaics there's otherwise not much stopping this from working against a
standby?


Hmm, that could be done. The temporary table is convenient, but it could 
be refactored to work without it.



That might lose the local copy mode, but how important is that?
pg_basebackup doesn't have that mode.


But we have plain pg_start/stop_backup for that case. That alternative
doesn't really exist here.


Also, the local mode works when the server is shut down. The 
pg_basebackup analogue of that is just cp -a $PGDATA backup.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind in contrib

2015-01-07 Thread Heikki Linnakangas

On 01/06/2015 10:39 PM, Peter Eisentraut wrote:

If this ends up shipping, it's going to be a massively popular tool.  I
see it as a companion to pg_basebackup.  So it should sort of work the
same way.  One problem is that it doesn't use the replication protocol,
so the setup is going to be inconsistent with pg_basebackup.  Maybe the
replication protocol could be extended to provide the required data.
Maybe something as simple as give me this file would work.


Yeah, that would be nice. But I think we can live with it as it is for 
now, and add that later.



That might lose the local copy mode, but how important is that?
pg_basebackup doesn't have that mode.  In any case, the documentation
doesn't explain this distinction.  The option documentation is a bit
short in any case, but it's not clear that you can choose between local
and remote mode.


Changing the libpq mode to use additional replication protocol commands 
would be a localized change to libpq_fetch.c. No need to touch the local 
copy mode.



The test suite should probably be reimplemented in Perl.  (I might be
able to help.)  Again, ingenious, but it's very hard to follow the
sequence of what is being tested.  And some Windows person is going to
complain. ;-)


Yeah, totally agreed.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fillfactor for GIN indexes

2015-01-07 Thread Michael Paquier
On Wed, Dec 3, 2014 at 2:37 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Nov 28, 2014 at 4:27 AM, Alexander Korotkov
 aekorot...@gmail.com wrote:
 On Fri, Nov 21, 2014 at 8:12 AM, Michael Paquier michael.paqu...@gmail.com
 wrote:
 Please find attached a simple patch adding fillfactor as storage parameter
 for GIN indexes. The default value is the same as the one currently aka 100
 to have the pages completely packed when a GIN index is created.


 That's not true. Let us discuss it a little bit.
 [blah discussion]
That's quite a nice explanation. Thanks!

 My summary is following:
 1) In order to have fully correct support of fillfactor in GIN we need to
 rewrite GIN build algorithm.
 2) Without rewriting GIN build algorithm, not much can be done with entry
 tree. However, you can implement some heuristics.
TBH, I am not really planning to rewrite the whole code.

 3) You definitely need to touch code that selects ratio of split in
 dataPlaceToPageLeaf (starting with if (!btree-isBuild)).
OK I see, so for a split we need to have a calculation based on the
fillfactor, with 75% by default.

 4) GIN data pages are always compressed excepts pg_upgraded indexes from
 pre-9.4. Take care about it in following code.
   if (GinPageIsCompressed(page))
   freespace = GinDataLeafPageGetFreeSpace(page);
 + else if (btree-isBuild)
 + freespace = BLCKSZ * (100 - fillfactor) / 100;
Hm. Simply reversing both conditions is fine, no?

 This is a very interesting explanation; thanks for writing it up!
 It does leave me wondering why anyone would want fillfactor for GIN at
 all, even if they were willing to rewrite the build algorithm.
Based on the explanation of Alexander, the current GIN code fills in a
page at 75% for a split, and was doing even 50/50 pre-9.4 if I recall
correctly. IMO a higher fillfactor makes sense for a GIN index that
gets less random updates, no?

I am attaching an updated patch, with the default fillfactor value at
75%, and with the page split code using the fillfactor rate.
Thoughts?
-- 
Michael
From 39c6cab320f648b61cc65654ae67e62d8926bfa1 Mon Sep 17 00:00:00 2001
From: Michael Paquier michael@otacoo.com
Date: Fri, 21 Nov 2014 14:08:54 +0900
Subject: [PATCH] Support fillfactor for GIN indexes

Users can call this new storage parameter to fill in the entry and
leaf pages of a newly-built index as wanted. Fillfactor range varies
between 20 and 100.
---
 doc/src/sgml/ref/create_index.sgml |  4 ++--
 src/backend/access/common/reloptions.c |  9 +
 src/backend/access/gin/gindatapage.c   | 34 --
 src/backend/access/gin/ginentrypage.c  | 20 +++-
 src/backend/access/gin/ginutil.c   |  3 ++-
 src/include/access/gin_private.h   |  3 +++
 6 files changed, 59 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 6b2ee28..c0ba24a 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -294,8 +294,8 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] replaceable class=
para
 The optional literalWITH/ clause specifies firsttermstorage
 parameters/ for the index.  Each index method has its own set of allowed
-storage parameters.  The B-tree, hash, GiST and SP-GiST index methods all
-accept this parameter:
+storage parameters.  The B-tree, hash, GIN, GiST and SP-GiST index methods
+all accept this parameter:
/para
 
variablelist
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index f008fab..418ecec 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -15,6 +15,7 @@
 
 #include postgres.h
 
+#include access/gin_private.h
 #include access/gist_private.h
 #include access/hash.h
 #include access/htup_details.h
@@ -133,6 +134,14 @@ static relopt_int intRelOpts[] =
 	},
 	{
 		{
+			fillfactor,
+			Packs gin index pages only to this percentage,
+			RELOPT_KIND_GIN
+		},
+		GIN_DEFAULT_FILLFACTOR, GIN_MIN_FILLFACTOR, 100
+	},
+	{
+		{
 			autovacuum_vacuum_threshold,
 			Minimum number of tuple updates or deletes prior to vacuum,
 			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
diff --git a/src/backend/access/gin/gindatapage.c b/src/backend/access/gin/gindatapage.c
index 8e81f6c..eebfa62 100644
--- a/src/backend/access/gin/gindatapage.c
+++ b/src/backend/access/gin/gindatapage.c
@@ -446,11 +446,21 @@ dataPlaceToPageLeaf(GinBtree btree, Buffer buf, GinBtreeStack *stack,
 	leafSegmentInfo *lastleftinfo;
 	ItemPointerData maxOldItem;
 	ItemPointerData remaining;
+	int			fillfactor;
 
 	Assert(GinPageIsData(page));
 
 	rbound = *GinDataPageGetRightBound(page);
 
+	/* Grab option values */
+	if (btree-index-rd_options)
+	{
+		GinOptions *options = (GinOptions *) btree-index-rd_options;
+		fillfactor = options-fillfactor;
+	}
+	else
+		fillfactor = GIN_DEFAULT_FILLFACTOR;
+
 	/*
 	 * Count how many of the new items belong 

Re: [HACKERS] parallel mode and parallel contexts

2015-01-07 Thread Robert Haas
On Tue, Jan 6, 2015 at 4:37 PM, Simon Riggs si...@2ndquadrant.com wrote:
 So when you say Only the top frame of the transaction state stack is
 copied you don't mean the top, you mean the bottom (the latest
 subxact)? Which then becomes the top in the parallel worker? OK...

The item most recently added to the stack is properly called the top,
but I guess it's confusing in this case because the item on the bottom
of the stack is referred to as the TopTransaction.  I'll see if I can
rephrase that.

 Those comments really belong in a README, not the first visible
 comment in xact.c

OK.

 You need to start with the explanation that parallel workers have a
 faked-up xact stack to make it easier to copy and manage. That is
 valid because we never change xact state during a worker operation.

OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind in contrib

2015-01-07 Thread Heikki Linnakangas

On 01/06/2015 10:39 PM, Peter Eisentraut wrote:

The test suite should probably be reimplemented in Perl.  (I might be
able to help.)  Again, ingenious, but it's very hard to follow the
sequence of what is being tested.  And some Windows person is going to
complain. ;-)


I took a shot at rewriting the tests in perl. It works, but I would 
appreciate help in reviewing and fixing any bad perl I may have written. 
I have not added new tests or changed the things they test, this is just 
a straightforward translation from the mix of bash scripts and 
pg_regress to perl.


- Heikki



pg_rewind-contrib-3.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql -c does not honor ON_ERROR_STOP

2015-01-07 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/06/2015 12:36 PM, Tom Lane wrote:
 -c submits the entire string to the backend in one PQexec();
 therefore ON_ERROR_STOP cannot have any impact on its behavior.
 The backend will abandon processing the whole string upon first
 error, embedded begin/ commit commands notwithstanding.
 
 There's been repeated discussion of changing -c so that the string
 is split apart and processed more like it would be if it'd been
 read from stdin.  However, given the number of ways you can already
 submit a string via stdin, this wouldn't be buying any new
 functionality.  Even discounting backwards-compatibility
 considerations, such a change would make it completely impossible
 to test multiple-commands-per-PQexec scenarios using psql.  So I'm
 inclined to leave it alone.
 
 Possibly the point is worth documenting explicitly, though.

OK, thanks. I'll submit a proposal for an update to the docs.

Joe


- -- 
Joseph E Conway
credativ LLC

270 E Douglas Ave
El Cajon, CA 92020

Office: +1 619 270 8787
Mobile: +1 619 843 8340

===
USA: http://www.credativ.us
Canada:  http://www.credativ.ca
Germany: http://www.credativ.de
Netherlands: http://www.credativ.nl
UK:  http://www.credativ.co.uk
India:   http://www.credativ.in
===
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQIcBAEBAgAGBQJUrTo5AAoJEDfy90M199hl+7AP/jdG9IvJ6jH6GmjTQ8DLQZVu
QiIJYpuE0A5Zr6bNukksX4ItDrvhuERwlLMQelTWhByYYh0CYEQf/4dPs2vPn+QG
ZXRS6CE9YSQ+umab1O05Bdft8CqrQ682C12PxogInwy38NzmspTAbcGDq+Yve9Z4
EMpMN4Jz0sK77N0acTw/HqLNQMwaGSeUAh/1jKnKB85RKpLgkHmTpuQhrQwZ7mfY
CJdUbyBUj9PGTHkEtUaRtUU9FF+KAax2OX6KDaC8neZ9YF1hMv4CkA5jOr8Hm2yL
JQJCjjC1wCt5e4VN+/d0lNsBj7/CCLPpzy4rnOUEEJOCqe/n6dc65tC72B2zirTH
5gWHkI3rC4EeoWVm9oxutEmhG5ocLv7aTTr7ZUXwEPHk+U3MMbgFfrz1VF6z6Ymp
t6LcWztBR+VLlXPiHwCSDNKenSimyDWACuIJcGArTf5BZmekRGpLW03bgnJkjXSZ
IEXyqcAGw7Tac7t12cZ/f4sDsWWM+yOOIdq82SsXJ052P1patl5l1H1aU5Tek7WE
qYwEFOMD8e2bCqOHm8Ij94ZymfN5eirscFL9fcM1QkHHgKsXh5ShUCTwtxHWYlMC
yth50q7mdmrW3Lych1ry0bYBecjF/dOlgjv19hWHZXmuCXU+5uhwbJUzvHOzioiJ
SfbED2GriwGDudEtgHnb
=qdOf
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fillfactor for GIN indexes

2015-01-07 Thread Michael Paquier
On Thu, Jan 8, 2015 at 6:31 AM, Alexander Korotkov aekorot...@gmail.com wrote:
 On Wed, Jan 7, 2015 at 4:11 PM, Michael Paquier michael.paqu...@gmail.com
 I am attaching an updated patch, with the default fillfactor value at
 75%, and with the page split code using the fillfactor rate.
 Thoughts?
 Rewritten version of patch is attached. I made following changes:

Thanks! With this patch (and my previous version as well) GIN indexes
with default fillfactor have a size higher than 9.4 indexes, 9.4
behavior being consistent only with fillfactor=100 and not the default
of 90. Are we fine with that?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning recovery.conf into GUCs

2015-01-07 Thread Josh Berkus
On 01/07/2015 02:31 PM, Peter Eisentraut wrote:
 On 1/6/15 7:33 PM, Josh Berkus wrote:
 D. Wierd name: for those doing only replication, not PITR,
 recovery.conf is completely baffling.
 
 I don't disagree, but standby.enabled or whatever isn't any better,
 for the inverse reason.

Yeah.  Like I said, I posted a list of bugs and features so that we can
test your solution and the pg.conf merger to see how much they actually
improve things.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS

2015-01-07 Thread Peter Geoghegan
On Wed, Jan 7, 2015 at 12:01 PM, Stephen Frost sfr...@snowman.net wrote:
 Other databases have this capability and have triggers and at least one
 ends up firing both INSERT and UPDATE triggers, with many complaints
 from users about how that ends up making the performance suck.  Perhaps
 we could use that as a fallback but support the explicit single trigger
 option too..  Just some thoughts, apologies if it's already been
 convered in depth previously.

I would like to expose whether or not statement-level triggers are
being called in the context of an INSERT ... ON CONFLICT UPDATE, FWIW.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS

2015-01-07 Thread Peter Geoghegan
On Wed, Jan 7, 2015 at 12:26 PM, Stephen Frost sfr...@snowman.net wrote:
 I agree with that up to a point- this case feels more like a POLA
 violation though rather than a run-of-the-mill you need to test all
 that.  I'm a bit worried this might lead to cases where users can't use
 UPSERT because they don't know which set of policies will end up getting
 applied, although the above approach is strictly a subset of the
 approach I'm advocating, but at least with the 'throw it all together'
 case, you know exactly what you're getting with UPSERT.

I think that it makes sense to bunch the policies together, because
INSERT ... ON CONFLICT UPDATE is mostly an INSERT - informally
speaking, the UPDATE part is just enough to resolve a conflict -
this is ultimately just a new clause of INSERT. Besides, I think that
making the permissions as restrictive as possible is the least
surprising behavior. It's clearly the most secure behavior.

I think that we collectively lean towards actively throwing an error
as the preferred behavior - I think that's good, because the intent of
the user to UPDATE some particular tuple that they're not allowed to
UPDATE is clear and unambiguous. As long as we're doing that, clearly
writing a query that will fail based on the wrong path being taken
is wrong-headed. Now, you can construct an UPSERT case that a bunch
together policies approach doesn't serve well. I could be mistaken,
but I think that case is likely to be more than a little contrived.

We're now going to be testing the TARGET.* tuple before going to
update, having failed to insert (and after row locking the TARGET.*
tuple) - the target tuple is really how the update finds the existing
row to update, so it should be tested, without necessarily being
blamed directly (certainly, we cannot leak the TARGET.* tuple values,
so maybe we should try blaming the EXCLUDED.* tuple first for error
messaging reasons). If it fails because of the insert check after row
locking but before update, well, it was liable to anyway, when the
insert path was taken. Conversely, if it failed after actually
inserting where that would not happen with a vanilla INSERT (due to a
bunched-together update USING() security barrier qual or explicit
user-supplied CHECK OPTION), well, you'd have gotten the same thing if
you took the UPDATE path anyway.

That just leaves the regular ExecUpdate() call of
ExecWithCheckOptions(). When that is called with bunched together
policies, you're now testing the final tuple. So it might be a bit
confusing that the policy of final tuples originating from INSERTs is
also enforced here, for the final tuple of an UPDATE. But in order
for even that to be the case, you'd have to have a final tuple that
was substantially different from the one originally proposed for
insertion that made the difference between a qual passing and not
passing. How likely is that in realistic use cases? And besides, isn't
the policy for INSERTs still quite relevant even here? We're talking
about a final tuple that originated from an INSERT statement, after
all.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] empty select list allowed when using function as table

2015-01-07 Thread Merlin Moncure
Interestingly, the following query works (it didn't used to):

select from generate_series(1, 1);

Was this intentional?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] empty select list allowed when using function as table

2015-01-07 Thread Marko Tiikkaja

On 2015-01-08 01:13, Merlin Moncure wrote:

Interestingly, the following query works (it didn't used to):

select from generate_series(1, 1);

Was this intentional?


See 1b4f7f93b4693858cb983af3cd557f6097dab67b


.marko



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible typo in create_policy.sgml

2015-01-07 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Tue, Jan 6, 2015 at 4:07 PM, Peter Geoghegan p...@heroku.com wrote:
  On Tue, Jan 6, 2015 at 1:03 PM, Robert Haas robertmh...@gmail.com wrote:
  I thought my rewrite clarified this distinction pretty well.  Maybe
  I'm wrong?  We're talking about the same paragraph.
 
  Sorry, I didn't express myself clearly. I think that you did get it
  right, but I would like to see that distinction also reflected in the
  actual sgml PARAMETER class tag. expression is way too generic here.
 
 I'm happy to see us change that if it makes sense, but I'm not sure
 what that actually does.

If I'm following correctly, Peter's specifically talking about:

[ USING ( replaceable class=parameterexpression/replaceable ) ]
[ WITH CHECK ( replaceable 
class=parametercheck_expression/replaceable ) ]

Where the USING parameter is 'expression' but the WITH CHECK parameter
is 'check_expression'.  He makes a good point, I believe, as
expression is overly generic.  I don't like the idea of using
barrier_expression though as that ends up introducing a new term- how
about using_expression?

This would need to be reflected below in the documentation which talks
about expression as the parameter to USING, but I can certainly
handle cleaning that up.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_basebackup fails with long tablespace paths

2015-01-07 Thread Robert Haas
On Tue, Jan 6, 2015 at 4:33 PM, Peter Eisentraut pete...@gmx.net wrote:
 Currently, when you unpack a tarred basebackup with tablespaces, the
 symlinks will tell you whether you have unpacked the tablespace tars at
 the right place.  Otherwise, how do you know?  Secondly, you also have
 the option of putting the tablespaces somewhere else by changing the
 symlinks.

That's a good argument for making the tablespace-map file
human-readable and human-editable, but I don't think it's an argument
for duplicating its contents inaccurately in the filesystem.

 One way to address this would be to do away with the symlinks altogether
 and have pg_tblspc/12345 be a text file that contains the tablespace
 location.  Kind of symlinks implemented in user space.

Well, that's just spreading the tablespace-map file out into several
files, and maybe keeping it around after we've restored from backup.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible typo in create_policy.sgml

2015-01-07 Thread Robert Haas
On Tue, Jan 6, 2015 at 4:07 PM, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Jan 6, 2015 at 1:03 PM, Robert Haas robertmh...@gmail.com wrote:
 I thought my rewrite clarified this distinction pretty well.  Maybe
 I'm wrong?  We're talking about the same paragraph.

 Sorry, I didn't express myself clearly. I think that you did get it
 right, but I would like to see that distinction also reflected in the
 actual sgml PARAMETER class tag. expression is way too generic here.

I'm happy to see us change that if it makes sense, but I'm not sure
what that actually does.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS

2015-01-07 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Wed, Jan 7, 2015 at 4:04 AM, Dean Rasheed dean.a.rash...@gmail.com wrote:
  I think the policies applied should depend on the path taken, so if it
  does an INSERT, then only the INSERT CHECK policy should be applied
  (after the insert), but if it ends up doing an UPDATE, I would expect
  the UPDATE USING policy to be applied (before the update) and the
  UPDATE CHECK policy to be applied (after the update). I would not
  expect the INSERT CHECK policy to be applied on the UPDATE path.
 
 I agree.

I can certainly understand the appeal of this approach, but I don't
think it makes sense.  Consider what happens later on down the road,
after the code has been written and deployed using INSERT .. ON CONFLICT
UPDATE where 99% of the time only one path or the other is taken.  Then
the other path is taken and suddenly the exact same command and row ends
up returning errors.  Additional testing should have been done to check
if that happens, of course, but I really don't like the idea that the
exact same command, with the exact same policies, would succeed or fail,
due to policies, based on the data in the database.

That's not to say that, generally, speaking, the data in the database
shouldn't impact policy failures, but in this case, the policies might
not even reference any other tables in the database.

  As to whether the UPDATE USING policy should cause an error to be
  thrown if it is not satisfied, my inclination would be to not error,
  and use the command tag to report that no rows were updated, since
  that is what would happen with a regular UPDATE.
 
 My inclination would be to error, but I'd be OK with your proposal.

I'm pretty strongly in favor of erroring. :)

  So overall INSERT .. ON CONFLICT UPDATE would be consistent with
  either an INSERT or an UPDATE, depending on whether the row existed
  beforehand, which is easier to explain than having some special UPSERT
  semantics.
 
 Yeah.  We won't escape the question so easily where triggers are
 concerned, but at least for RLS it seems like it should be possible to
 avoid confusing, one-off semantics.

Triggers are another question..

One approach that I don't recall seeing (and I'm not convinced that it's
a good idea either, but thought it deserved mention) is the idea of
having an UPSERT-specific set of policies (and perhaps an
UPSERT-specific trigger...?).  That gets pretty grotty when you consider
existing systems which have INSERT and UPDATE triggers already, but
UPSERT is a pretty big deal and a big change and for users who are just
starting to use it, requiring that they write triggers and policies
specifically to address that case might be acceptable.  That doesn't
address the cases where users have direct SQL access and might be able
to then bypass existing triggers, but we might be able to work out an
answer to that case..

Other databases have this capability and have triggers and at least one
ends up firing both INSERT and UPDATE triggers, with many complaints
from users about how that ends up making the performance suck.  Perhaps
we could use that as a fallback but support the explicit single trigger
option too..  Just some thoughts, apologies if it's already been
convered in depth previously.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Possible typo in create_policy.sgml

2015-01-07 Thread Stephen Frost
* Peter Geoghegan (p...@heroku.com) wrote:
 I also don't see this behavior documented (this is from process_policies()):
 
 /*
 * If we end up with only USING quals, then use those as
 * WITH CHECK quals also.
 */
 if (with_check_quals == NIL)
 with_check_quals = copyObject(quals);
 
 Now, I do see a reference to it under Per-Command policies - ALL. It says:
 
 If an INSERT or UPDATE command attempts to add rows to the table
 which do not pass the ALL WITH CHECK (or USING, if no WITH CHECK
 expression is defined) expression, the command will error.
 
 But is that really the right place for it? Does it not equally well
 apply to FOR UPDATE policies, that can on their own have both barriers
 quals and WITH CHECK options? Basically, that seems to me like a
 *generic* property of policies (it's a generic thing that the WITH
 CHECK options/expressions shadow the USING security barrier quals as
 check options), and so should be documented as such.

Ah, yes, good point, I can add more documentation around that.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS

2015-01-07 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Wed, Jan 7, 2015 at 3:01 PM, Stephen Frost sfr...@snowman.net wrote:
  * Robert Haas (robertmh...@gmail.com) wrote:
  On Wed, Jan 7, 2015 at 4:04 AM, Dean Rasheed dean.a.rash...@gmail.com 
  wrote:
   I think the policies applied should depend on the path taken, so if it
   does an INSERT, then only the INSERT CHECK policy should be applied
   (after the insert), but if it ends up doing an UPDATE, I would expect
   the UPDATE USING policy to be applied (before the update) and the
   UPDATE CHECK policy to be applied (after the update). I would not
   expect the INSERT CHECK policy to be applied on the UPDATE path.
 
  I agree.
 
  I can certainly understand the appeal of this approach, but I don't
  think it makes sense.  Consider what happens later on down the road,
  after the code has been written and deployed using INSERT .. ON CONFLICT
  UPDATE where 99% of the time only one path or the other is taken.  Then
  the other path is taken and suddenly the exact same command and row ends
  up returning errors.
 
 I'd say: that's life.  If you don't test your policies, they might not work.

I agree with that up to a point- this case feels more like a POLA
violation though rather than a run-of-the-mill you need to test all
that.  I'm a bit worried this might lead to cases where users can't use
UPSERT because they don't know which set of policies will end up getting
applied, although the above approach is strictly a subset of the
approach I'm advocating, but at least with the 'throw it all together'
case, you know exactly what you're getting with UPSERT.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] parallel mode and parallel contexts

2015-01-07 Thread Jim Nasby

On 1/7/15, 9:39 AM, Robert Haas wrote:

sequence.c: Is it safe to read a sequence value in a parallel worker if the
cache_value is  1?

No, because the sequence cache isn't synchronized between the workers.
Maybe it would be safe if cache_value == 1, but there's not much
use-case: how often are you going to have a read-only query that uses
a sequence value.  At some point we can look at making sequences
parallel-safe, but worrying about it right now doesn't seem like a
good use of time.


Agreed, I was more concerned with calls to nextval(), which don't seem to be 
prevented in parallel mode?


This may be a dumb question, but for functions do we know that all pl's
besides C and SQL use SPI? If not I think they could end up writing in a
worker.

Well, the lower-level checks would catch that.  But it is generally
true that there's no way to prevent arbitrary C code from doing things
that are unsafe in parallel mode and that we can't tell are unsafe.
As I've said before, I think that we'll need to have a method of
labeling functions as parallel-safe or not, but this patch isn't
trying to solve that part of the problem.


I was more thinking about all the add-on pl's like pl/ruby. But yeah, I don't 
see that there's much we can do if they're not using SPI.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: [BUGS] BUG #12320: json parsing with embedded double quotes

2015-01-07 Thread Andrew Dunstan



On 01/07/2015 08:25 AM, Aaron Botsis wrote:
Hi folks, I was having a problem importing json data with COPY. Lots 
of things export data nicely as one json blob per line. This is 
excellent for directly importing into a JSON/JSONB column for analysis.


...Except when there’s an embedded doublequote. Or anything that’s 
escaped. COPY handles this, but by the time the escaped char hit the 
JSON parser, it's not escaped anymore. This breaks the JSON parsing. 
This means I need to manipulate the input data to double-escape it. 
See bug #12320 for an example. Yuck.


I propose this small patch that simply allows specifying COPY … ESCAPE 
without requiring the CSV parser. It will make it much easier to 
directly use json formatted export data for folks going forward. This 
seemed like the simplest route.


Usage is simply:

postgres=# copy t1 from '/Users/nok/Desktop/queries.json';
ERROR:  invalid input syntax for type json
DETAIL:  Token root is invalid.
CONTEXT:  JSON data, line 1: ...1418066241619 AND =1418671041621) AND 
user:root...
COPY t1, line 3, column bleh: 
{timestamp:2014-12-15T19:17:32.505Z,duration:7.947,query:{query:{filtered:{filter:{qu...

postgres=# copy t1 from '/Users/nok/Desktop/queries.json' escape '';
COPY 1966





This isn't a bug. Neither CSV format nor TEXT format are partucularly 
suitable for json. I'm quite certain I could compose legal json that 
will break your proposal (for example, with an embedded newline in the 
white space.)


It's also unnecessary. CSV format, while not designed for this, is 
nevertheless sufficiently flexible to allow successful import of json 
data meeting certain criteria (essentially no newlines), like this:


   copy the_table(jsonfield)
   from '/path/to/jsondata'
   csv quote e'\x01' delimiter e'\x02';


You aren't the first person to encounter this problem. See 
http://adpgtech.blogspot.com/2014/09/importing-json-data.html


Maybe we need to add something like this to the docs, or to the wiki.

Note too my comment in that blog post:

   Now this solution is a bit of a hack. I wonder if there's a case for
   a COPY mode that simply treats each line as a single datum. I also
   wonder if we need some more specialized tools for importing JSON,
   possibly one or more Foreign Data Wrappers. Such things could
   handle, say, embedded newline punctuation.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible typo in create_policy.sgml

2015-01-07 Thread Robert Haas
On Wed, Jan 7, 2015 at 3:06 PM, Stephen Frost sfr...@snowman.net wrote:
 If I'm following correctly, Peter's specifically talking about:

 [ USING ( replaceable class=parameterexpression/replaceable ) ]
 [ WITH CHECK ( replaceable 
 class=parametercheck_expression/replaceable ) ]

 Where the USING parameter is 'expression' but the WITH CHECK parameter
 is 'check_expression'.  He makes a good point, I believe, as
 expression is overly generic.  I don't like the idea of using
 barrier_expression though as that ends up introducing a new term- how
 about using_expression?

Oh.  Well, I guess we could change that.  I don't think it's a
problem, myself.  I thought he was talking about something in the SGML
markup.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS

2015-01-07 Thread Robert Haas
On Wed, Jan 7, 2015 at 3:01 PM, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 On Wed, Jan 7, 2015 at 4:04 AM, Dean Rasheed dean.a.rash...@gmail.com 
 wrote:
  I think the policies applied should depend on the path taken, so if it
  does an INSERT, then only the INSERT CHECK policy should be applied
  (after the insert), but if it ends up doing an UPDATE, I would expect
  the UPDATE USING policy to be applied (before the update) and the
  UPDATE CHECK policy to be applied (after the update). I would not
  expect the INSERT CHECK policy to be applied on the UPDATE path.

 I agree.

 I can certainly understand the appeal of this approach, but I don't
 think it makes sense.  Consider what happens later on down the road,
 after the code has been written and deployed using INSERT .. ON CONFLICT
 UPDATE where 99% of the time only one path or the other is taken.  Then
 the other path is taken and suddenly the exact same command and row ends
 up returning errors.

I'd say: that's life.  If you don't test your policies, they might not work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind in contrib

2015-01-07 Thread Andrew Dunstan


On 01/07/2015 03:22 AM, Heikki Linnakangas wrote:

On 01/07/2015 01:54 AM, Andrew Dunstan wrote:

I also think it's a great idea. But I think we should consider the name
carefully. pg_resync might be a better name. Strictly, you might not be
quite rewinding, AIUI.


pg_resync sounds too generic. It's true that if the source server has 
changes of its own, then it's more of a sideways movement than 
rewinding, but I think it's nevertheless a good name.


It does always rewind the control file, so that after startup, WAL 
replay begins from the last common point in history between the 
servers. WAL replay will catch up with the source server, which might 
be ahead of last common point, but strictly speaking pg_rewind is not 
involved at that point anymore.






I understand, but I think pg_rewind is likely to be misleading to many 
users who will say but I don't want just to rewind.


I'm not wedded to the name I suggested, but I think we should look at 
possible alternative names. We do have experience of misleading names 
causing confusion (e.g. initdb).


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] LSN Map

2015-01-07 Thread Bruce Momjian
On Wed, Jan  7, 2015 at 10:50:38AM +0100, Marco Nenciarini wrote:
 Implementation
 --
 
 We create an additional fork which contains a raw stream of LSNs. To
 limit the space used, every entry represent the maximum LSN of a group
 of blocks of a fixed size. I chose arbitrarily the size of 2048
 which is equivalent to 16MB of heap data, which means that we need 64k
 entry to track one terabyte of heap.

I like the idea of summarizing the LSN to keep its size reaonable.  Have
you done any measurements to determine how much backup can be skipped
using this method for a typical workload, i.e. how many 16MB page ranges
are not modified in a typical span between incremental backups?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] LSN Map

2015-01-07 Thread Alvaro Herrera
Bruce Momjian wrote:

 Have you done any measurements to determine how much backup can be
 skipped using this method for a typical workload, i.e. how many 16MB
 page ranges are not modified in a typical span between incremental
 backups?

That seems entirely dependent on the specific workload.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel mode and parallel contexts

2015-01-07 Thread Robert Haas
On Tue, Jan 6, 2015 at 9:37 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 CreateParallelContext(): Does it actually make sense to have nworkers=0?
 ISTM that's a bogus case.

I'm not sure whether we'll ever use the zero-worker case in
production, but I've certainly found it useful for
performance-testing.

 Also, since the number of workers will normally be
 determined dynamically by the planner, should that check be a regular
 conditional instead of just an Assert?

I don't think that's really necessary.  It shouldn't be too much of a
stretch for the planner to come up with a non-negative value.

 In LaunchParallelWorkers() the Start Workers comment states that we give
 up registering more workers if one fails to register, but there's nothing in
 the if condition to do that, and I don't see
 RegisterDynamicBackgroundWorker() doing it either. Is the comment just
 incorrect?

Woops, that got changed at some point and I forgot to update the
comment.  Will fix.

 SerializeTransactionState(): instead of looping through the transaction
 stack to calculate nxids, couldn't we just set it to maxsize -
 sizeof(TransactionId) * 3? If we're looping a second time for safety a
 comment explaining that would be useful...

Yeah, I guess we could do that.  I don't think it really matters very
much one way or the other.

 sequence.c: Is it safe to read a sequence value in a parallel worker if the
 cache_value is  1?

No, because the sequence cache isn't synchronized between the workers.
Maybe it would be safe if cache_value == 1, but there's not much
use-case: how often are you going to have a read-only query that uses
a sequence value.  At some point we can look at making sequences
parallel-safe, but worrying about it right now doesn't seem like a
good use of time.

 This may be a dumb question, but for functions do we know that all pl's
 besides C and SQL use SPI? If not I think they could end up writing in a
 worker.

Well, the lower-level checks would catch that.  But it is generally
true that there's no way to prevent arbitrary C code from doing things
that are unsafe in parallel mode and that we can't tell are unsafe.
As I've said before, I think that we'll need to have a method of
labeling functions as parallel-safe or not, but this patch isn't
trying to solve that part of the problem.

 @@ -2968,7 +2969,8 @@ ProcessInterrupts(void)
  errmsg(canceling statement due to
 user request)));
 }
 }
 -   /* If we get here, do nothing (probably, QueryCancelPending was
 reset) */
 +   if (ParallelMessagePending)
 +   HandleParallelMessageInterrupt(false);
 ISTM it'd be good to leave that comment in place (after the if).

 RestoreComboCIDState(): Assert(!comboCids || !comboHash): Shouldn't that be
 ? AIUI both should always be either set or 0.

Fixed.

 Typo: + elog(ERROR, cannot update SecondarySnapshpt during a
 parallel operation);

Fixed.

 The comment for RestoreSnapshot refers to set_transaction_snapshot, but that
 doesn't actually exist (or isn't referenced).

Fixed.

Will post a new version in a bit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] LSN Map

2015-01-07 Thread Bruce Momjian
On Wed, Jan  7, 2015 at 12:33:20PM -0300, Alvaro Herrera wrote:
 Bruce Momjian wrote:
 
  Have you done any measurements to determine how much backup can be
  skipped using this method for a typical workload, i.e. how many 16MB
  page ranges are not modified in a typical span between incremental
  backups?
 
 That seems entirely dependent on the specific workload.

Well, obviously.  Is that worth even stating?  

My question is whether there are enough workloads for this to be
generally useful, particularly considering the recording granularity,
hint bits, and freezing.  Do we have cases where 16MB granularity helps
compared to file or table-level granularity?  How would we even measure
the benefits?  How would the administrator know they are benefitting
from incremental backups vs complete backups, considering the complexity
of incremental restores?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fillfactor for GIN indexes

2015-01-07 Thread Alexander Korotkov
On Wed, Jan 7, 2015 at 4:11 PM, Michael Paquier michael.paqu...@gmail.com
wrote:

 On Wed, Dec 3, 2014 at 2:37 AM, Robert Haas robertmh...@gmail.com wrote:
  On Fri, Nov 28, 2014 at 4:27 AM, Alexander Korotkov
  aekorot...@gmail.com wrote:
  On Fri, Nov 21, 2014 at 8:12 AM, Michael Paquier 
 michael.paqu...@gmail.com
  wrote:
  Please find attached a simple patch adding fillfactor as storage
 parameter
  for GIN indexes. The default value is the same as the one currently
 aka 100
  to have the pages completely packed when a GIN index is created.
 
 
  That's not true. Let us discuss it a little bit.
  [blah discussion]
 That's quite a nice explanation. Thanks!

  My summary is following:
  1) In order to have fully correct support of fillfactor in GIN we need
 to
  rewrite GIN build algorithm.
  2) Without rewriting GIN build algorithm, not much can be done with
 entry
  tree. However, you can implement some heuristics.
 TBH, I am not really planning to rewrite the whole code.

  3) You definitely need to touch code that selects ratio of split in
  dataPlaceToPageLeaf (starting with if (!btree-isBuild)).
 OK I see, so for a split we need to have a calculation based on the
 fillfactor, with 75% by default.

  4) GIN data pages are always compressed excepts pg_upgraded indexes from
  pre-9.4. Take care about it in following code.
if (GinPageIsCompressed(page))
freespace = GinDataLeafPageGetFreeSpace(page);
  + else if (btree-isBuild)
  + freespace = BLCKSZ * (100 - fillfactor) / 100;
 Hm. Simply reversing both conditions is fine, no?

  This is a very interesting explanation; thanks for writing it up!
  It does leave me wondering why anyone would want fillfactor for GIN at
  all, even if they were willing to rewrite the build algorithm.
 Based on the explanation of Alexander, the current GIN code fills in a
 page at 75% for a split, and was doing even 50/50 pre-9.4 if I recall
 correctly. IMO a higher fillfactor makes sense for a GIN index that
 gets less random updates, no?

 I am attaching an updated patch, with the default fillfactor value at
 75%, and with the page split code using the fillfactor rate.
 Thoughts?


Rewritten version of patch is attached. I made following changes:

1) I removed fillfactor handling from entry tree. Because in this case
fillfactor parameter would be only upper bound for actual page fullness.
It's very like GiST approach to fillfactor. But I would rather remove
fillfactor from GiST than spread such approach to other AMs.
2) Fillfactor handling for posting trees is fixed.
3) Default fillfactor for GIN is reverted to 90. I didn't mean that default
fillfactor should be 75%. I mean that equilibrium state of tree would be
about 75% fullness. But that doesn't mean that we don't want indexes to be
better packed just after creation. Anyway I didn't see reason why default
fillfactor for GIN btree should differs from default fillfactor of regular
btree.

--
With best regards,
Alexander Korotkov.


gin_fillfactor_3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fillfactor for GIN indexes

2015-01-07 Thread Alexander Korotkov
On Thu, Jan 8, 2015 at 12:31 AM, Alexander Korotkov aekorot...@gmail.com
wrote:

 Rewritten version of patch is attached. I made following changes:

 1) I removed fillfactor handling from entry tree. Because in this case
 fillfactor parameter would be only upper bound for actual page fullness.
 It's very like GiST approach to fillfactor. But I would rather remove
 fillfactor from GiST than spread such approach to other AMs.
 2) Fillfactor handling for posting trees is fixed.
 3) Default fillfactor for GIN is reverted to 90. I didn't mean that
 default fillfactor should be 75%. I mean that equilibrium state of tree
 would be about 75% fullness. But that doesn't mean that we don't want
 indexes to be better packed just after creation. Anyway I didn't see reason
 why default fillfactor for GIN btree should differs from default fillfactor
 of regular btree.


Just forgot simple case that I used to test the patch.

create table test as (select array[1,2,3] v from
generate_series(1,100));
create index test_idx20 on test using gin(v) with (fillfactor=20);
create index test_idx30 on test using gin(v) with (fillfactor=30);
create index test_idx40 on test using gin(v) with (fillfactor=40);
create index test_idx50 on test using gin(v) with (fillfactor=50);
create index test_idx60 on test using gin(v) with (fillfactor=60);
create index test_idx70 on test using gin(v) with (fillfactor=70);
create index test_idx80 on test using gin(v) with (fillfactor=80);
create index test_idx90 on test using gin(v) with (fillfactor=90);
create index test_idx100 on test using gin(v) with (fillfactor=100);

   List of relations
 Schema |Name | Type  | Owner  | Table |  Size   | Description
+-+---++---+-+-
 public | test_idx20  | index | smagen | test  | 16 MB   |
 public | test_idx30  | index | smagen | test  | 11 MB   |
 public | test_idx40  | index | smagen | test  | 8152 kB |
 public | test_idx50  | index | smagen | test  | 6520 kB |
 public | test_idx60  | index | smagen | test  | 5176 kB |
 public | test_idx70  | index | smagen | test  | 4480 kB |
 public | test_idx80  | index | smagen | test  | 3928 kB |
 public | test_idx90  | index | smagen | test  | 3520 kB |
 public | test_idx100 | index | smagen | test  | 3184 kB |
(9 rows)

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] KNN-GiST with recheck

2015-01-07 Thread Alexander Korotkov
On Tue, Dec 16, 2014 at 4:37 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 Patch attached. It should be applied on top of my pairing heap patch at
 http://www.postgresql.org/message-id/548ffa2c.7060...@vmware.com. Some
 caveats:

 * The signature of the distance function is unchanged, it doesn't get a
 recheck argument. It is just assumed that if the consistent function sets
 the recheck flag, then the distance needs to be rechecked as well. We might
 want to add the recheck argument, like you Alexander did in your patch, but
 it's not important right now.


I didn't get how that expected to work if we have only order by qual
without filter qual. In this case consistent function just isn't called at
all.

* I used the distance term in the executor, although the ORDER BY expr
 machinery is more general than that. The value returned by the ORDER BY
 expression doesn't have to be a distance, although that's the only thing
 supported by GiST and the built-in opclasses.

 * I short-circuited the planner to assume that the ORDER BY expression
 always returns a float. That's true today for knn-GiST, but is obviously a
 bogus assumption in general.

 This needs some work to get into a committable state, but from a
 modularity point of view, this is much better than having the indexam to
 peek into the heap.


Nice idea to put reordering into index scan node. Doesn't look like much of
overengineering. I'm going to bring it to more commitable state.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Turning recovery.conf into GUCs

2015-01-07 Thread Peter Eisentraut
On 1/6/15 7:33 PM, Josh Berkus wrote:
 D. Wierd name: for those doing only replication, not PITR,
 recovery.conf is completely baffling.

I don't disagree, but standby.enabled or whatever isn't any better,
for the inverse reason.

But replication and PITR are the same thing, so any name is going to
have that problem.

One way out of that would be to develop higher-level abstractions, like
pg_ctl start -m standby or something, similar to how pg_ctl promote is
an abstraction and got people away from fiddling with trigger files.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VODKA?

2015-01-07 Thread Arthur Silva
On Jan 6, 2015 7:14 PM, Josh Berkus j...@agliodbs.com wrote:

 Oleg, Teodor:

 I take it VODKA is sliding to version 9.6?

 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

This is kinda off, but I was wondering if anyone ever considered running a
crowd-funding campaign for this sort of feature (aka potentially very
popular).


Re: [HACKERS] parallel mode and parallel contexts

2015-01-07 Thread Simon Riggs
On 7 January 2015 at 13:11, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jan 6, 2015 at 4:37 PM, Simon Riggs si...@2ndquadrant.com wrote:
 So when you say Only the top frame of the transaction state stack is
 copied you don't mean the top, you mean the bottom (the latest
 subxact)? Which then becomes the top in the parallel worker? OK...

 The item most recently added to the stack is properly called the top,
 but I guess it's confusing in this case because the item on the bottom
 of the stack is referred to as the TopTransaction.  I'll see if I can
 rephrase that.

Yes, I didn't mean to suggest the confusion was introduced by you -
it's just you stepped into the mess by correctly using the word top in
a place where its meaning would be opposite to the close-by usage of
TopTransaction.

Anyway, feeling good about this now. Thanks for your patience.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] LSN Map

2015-01-07 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Bruce Momjian wrote:
 Have you done any measurements to determine how much backup can be
 skipped using this method for a typical workload, i.e. how many 16MB
 page ranges are not modified in a typical span between incremental
 backups?

 That seems entirely dependent on the specific workload.

Maybe, but it's a reasonable question.  The benefit obtained from the
added complexity/overhead clearly goes to zero if you summarize too much,
and it's not at all clear that there's a sweet spot where you win.  So
I'd want to see some measurements demonstrating that this is worthwhile.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_rewind in contrib

2015-01-07 Thread Alvaro Herrera
What is this define good for?  I couldn't understand where it fits; is
it just a leftover?

+#define pageinfo_set_truncation(forkno, rnode, blkno) 
datapagemap_set_truncation(pagemap, forkno, rnode, blkno)


Please don't name files with generic names such as util.c/h.  It's
annoying later; for instance when I want to open analyze.c I have to
choose the one in parser or commands.  (pg_upgrade in particular is
already a mess; let's not copy that.)

Is the final destiny of this still contrib?  There are several votes for
putting it in src/bin/ right from the start, which sounds reasonable to
me as well.  We didn't put pg_basebackup in contrib initially for instance.
If we do that, we will need more attention to translatability markers of
user-visible error messages; if you want to continue using fprintf()
then you will need _() around the literals, but it might be wise to use
another function instead so that you can avoid them (say, have something
like pg_upgrade's pg_fatal() which you can then list in nls.mk).
...
Uh, I notice that you have _() in some places such as slurpFile().

I agree with Andres that it would be better to avoid a second copy of
the xlogreader helper stuff.  A #FRONTEND define in xlogreader.c seems
acceptable, and pg_rewind can have a wrapper function to add extra
functionality later, if necessary -- or we can add a flags argument,
currently unused, which could later be extended to indicate to read from
archive, or something.

Copyright lines need updated to 2015 (meh)

Our policy on header inclusion says that postgres_fe.h comes first, then
system includes, then other postgres headers.  So at least this one is
wrong:

+++ b/contrib/pg_rewind/copy_fetch.c
@@ -0,0 +1,586 @@
[...]
+#include postgres_fe.h
+
+#include catalog/catalog.h
+
+#include sys/types.h
+#include sys/stat.h
+#include dirent.h
+#include fcntl.h
+#include unistd.h
+#include string.h
+
+#include pg_rewind.h
+#include fetch.h
+#include filemap.h
+#include datapagemap.h
+#include util.h

The reason for this IIRC is that system includes could modify behavior
of some calls in obscure platforms under obscure circumstances.


+   while ((xlde = readdir(xldir)) != NULL)
+   {

You need to reset errno here, see commit 6f03927fce038096f53ca67eeab9a.

+static int dstfd = -1;
+static char dstpath[MAXPGPATH] = ;
+
+void
+open_target_file(const char *path, bool trunc)
+{

I think these file-level static vars ought to be declared at top of
file, probably with more distinctive names.

+void
+close_target_file(void)
+{
+   if (close(dstfd) != 0)
+   {
+   fprintf(stderr, error closing destination file \%s\: %s\n,
+   dstpath, strerror(errno));
+   exit(1);
+   }
+
+   dstfd = -1;
+   /* fsync? */
+}

Did you find an answer for the above question?


+static bool
+endswith(const char *haystack, const char *needle)
+{
+   int needlelen = strlen(needle);
+   int haystacklen = strlen(haystack);
+
+   if (haystacklen  needlelen)
+   return false;
+
+   return strcmp(haystack[haystacklen - needlelen], needle) == 0;
+}

We already have this function elsewhere.

+   if (type != FILE_TYPE_REGULAR  isRelDataFile(path))
+   {
+   fprintf(stderr, data file in source \%s\ is a directory.\n, 
path);
+   exit(1);
+   }

Here you have error messages ending with periods; but not in most other
places.

+/*
+ * Does it look like a relation data file?
+ */
+static bool
+isRelDataFile(const char *path)

This doesn't consider forks ... why not?  If correct, probably it deserves a 
comment.

+struct filemap_t
+{
+   /*
+* New entries are accumulated to a linked list, in process_remote_file
+* and process_local_file.
+*/
+   file_entry_t *first;
+   file_entry_t *last;
+   int nlist;

Uh, this is like the seventh open-coded list implementation in frontend
code.  Can't we have this using ilist for a change?


Existing practice is that SQL keywords are uppercase:

+   sql =
+   -- Create a recursive directory listing of the whole data 
directory\n
+   with recursive files (path, filename, size, isdir) as (\n
+ select '' as path, filename, size, isdir from\n
+ (select pg_ls_dir('.') as filename) as fn,\n
+   pg_stat_file(fn.filename) as this\n
+ union all\n
+ select parent.path || parent.filename || '/' as path,\n
+fn, this.size, this.isdir\n
+ from files as parent,\n
+  pg_ls_dir(parent.path || parent.filename) as fn,\n
+  pg_stat_file(parent.path || parent.filename || '/' || 
fn) as this\n
+  where parent.isdir = 't'\n
+   )\n
+   -- Using the cte, fetch a listing of the all the files.\n
+   --\n
+   -- For 

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2015-01-07 Thread Kevin Grittner
Ashutosh Bapat ashutosh.ba...@enterprisedb.com wrote:

 I don't see why would my patch cause inconsistencies. It can
 cause dangling PREPAREd transactions and I have already
 acknowledged that fact.

 Am I missing something?

To me that is the big problem.  Where I have run into ad hoc
distributed transaction managers it has usually been because a
crash left prepared transactions dangling, without cleaning them up
when the transaction manager was restarted.  This tends to wreak
havoc one way or another.

If we are going to include a distributed transaction manager with
PostgreSQL, it *must* persist enough information about the
transaction ID and where it is used in a way that will survive a
subsequent crash before beginning the PREPARE on any of the
systems.  After all nodes are PREPAREd it must flag that persisted
data to indicate that it is now at a point where ROLLBACK is no
longer an option.  Only then can it start committing the prepared
transactions.  After the last node is committed it can clear this
information.  On start-up the distributed transaction manager must
check for any distributed transactions left in progress and
commit or rollback based on the preceding; doing retries
indefinitely until it succeeds or is told to stop.

Doing this incompletely (i.e., not identifying and correctly
handling the various failure modes) is IMO far worse than not
attempting it.  If we could build in something that did this
completely and well, that would be a cool selling point; but let's
not gloss over the difficulties.  We must recognize how big a
problem it would be to include a low-quality implementation.

Also, as previously mentioned, it must behave in some reasonable
way if a database is not configured to support 2PC, especially
since 2PC is off by default in PostgreSQL.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] explain sortorder

2015-01-07 Thread Timmer, Marius
Hi,

we have spent the last days to realize your suggestions in the patch.
It affects the result of a EXPLAIN-Statement (even in non-verbose-mode). Now 
you will get the order-information for every single sort-key which is not 
ordered by the defaults.


best regards,

Marius




---
Marius Timmer
Zentrum für Informationsverarbeitung
Westfälische Wilhelms-Universität Münster
Einsteinstraße 60

mtimm...@uni-muenster.de


explain_sortorder-v6.patch
Description: explain_sortorder-v6.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch: [BUGS] BUG #12320: json parsing with embedded double quotes

2015-01-07 Thread Aaron Botsis
Hi folks, I was having a problem importing json data with COPY. Lots of things export data nicely as one json blob per line. This is excellent for directly importing into a JSON/JSONB column for analysisExcept when there’s an embedded doublequote. Or anything that’s escaped. COPY handles this, but by the time the escaped char hit the JSON parser, it's not escaped anymore. This breaks the JSON parsing. This means I need to manipulate the input data to double-escape it. See bug #12320 for an example. Yuck.I propose this small patch that simply allows specifying COPY … ESCAPE without requiring the CSV parser. It will make it much easier to directly use json formatted export data for folks going forward. This seemed like the simplest route.Usage is simply:postgres=# copy t1 from '/Users/nok/Desktop/queries.json';ERROR: invalid input syntax for type jsonDETAIL: Token "root" is invalid.CONTEXT: JSON data, line 1: ...1418066241619 AND =1418671041621) AND user:"root...COPY t1, line 3, column bleh: "{"timestamp":"2014-12-15T19:17:32.505Z","duration":7.947,"query":{"query":{"filtered":{"filter":{"qu..."postgres=# copy t1 from '/Users/nok/Desktop/queries.json' escape '';COPY 1966I’ve included regression tests, and all existing tests pass. This is my first contribution, so be kind to me. :)

escape-without-csv.patch
Description: Binary data
Begin forwarded message:From: Francisco Olarte fola...@peoplecall.comDate: January 6, 2015 at 1:52:28 PM ESTSubject: Re: [BUGS] BUG #12320: json parsing with embedded double quotesTo: Aaron Botsis aa...@bt-r.comCc: postg...@bt-r.com, "pgsql-b...@postgresql.org" pgsql-b...@postgresql.orgHi Aaron:On Tue, Jan 6, 2015 at 7:06 PM, Aaron Botsis aa...@bt-r.com wrote:Hi Francisco, I’m aware, but still consider this to be a bug, or at least a great opportunity for an enhancement. :)Maybe, but you are going to have a problem.This had bitten me for the third time while trying to import some json data. It’d be great to bypass the copy escaping (and possibly other meta characters) when the column type is json or jsonb. I’d be happy to try and write it and submit a patch if folks believe this is an acceptable way to go… That said, I should probably read what the process is for this kind of thing :)Reading this, you are talking about 'the column being json'. COPY needs to do the escaping at the same time it's constructing the columns. The present way is easy to do, read char by char, if it's a escape, process next char acumulating into current field, otherwise see whether it is a field or record separator and act accordingly. It's also layered, when you construct the records for copy you get all the field data, turn them into escaped strings, join them by the field separator and spit them out followed by a record separator ( practical implementations may do this virtually ). Intermixing this with the 'I'm in a json column' would need to pass information from the upper layer, and make it more difficult and, specially error prone. What do you do if ( using the standard delimiters ) your json value has embeded newlines and tabs ( which, IIRC, are legal in several places inside the json ). And all this to make some incorrectly formatted files read ( which can be correctly formatted with a perl one liner or something similar ). I'm not the one to decide, but I will vote against including that ( but do not trust me too much, I would also vote against including 'csv' which I consider the root of many evils ).Francisco Olarte.


Re: [HACKERS] Patch: [BUGS] BUG #12320: json parsing with embedded double quotes

2015-01-07 Thread Aaron Botsis
Ack. Original had a bug. Let’s try again.

escape-without-csv-v2.patch
Description: Binary data
On Jan 7, 2015, at 8:25 AM, Aaron Botsis aa...@bt-r.com wrote:Hi folks, I was having a problem importing json data with COPY. Lots of things export data nicely as one json blob per line. This is excellent for directly importing into a JSON/JSONB column for analysisExcept when there’s an embedded doublequote. Or anything that’s escaped. COPY handles this, but by the time the escaped char hit the JSON parser, it's not escaped anymore. This breaks the JSON parsing. This means I need to manipulate the input data to double-escape it. See bug #12320 for an example. Yuck.I propose this small patch that simply allows specifying COPY … ESCAPE without requiring the CSV parser. It will make it much easier to directly use json formatted export data for folks going forward. This seemed like the simplest route.Usage is simply:postgres=# copy t1 from '/Users/nok/Desktop/queries.json';ERROR: invalid input syntax for type jsonDETAIL: Token "root" is invalid.CONTEXT: JSON data, line 1: ...1418066241619 AND =1418671041621) AND user:"root...COPY t1, line 3, column bleh: "{"timestamp":"2014-12-15T19:17:32.505Z","duration":7.947,"query":{"query":{"filtered":{"filter":{"qu..."postgres=# copy t1 from '/Users/nok/Desktop/queries.json' escape '';COPY 1966I’ve included regression tests, and all existing tests pass. This is my first contribution, so be kind to me. :)escape-without-csv.patchBegin forwarded message:From: Francisco Olarte fola...@peoplecall.comDate: January 6, 2015 at 1:52:28 PM ESTSubject: Re: [BUGS] BUG #12320: json parsing with embedded double quotesTo: Aaron Botsis aa...@bt-r.comCc: postg...@bt-r.com, "pgsql-b...@postgresql.org" pgsql-b...@postgresql.orgHi Aaron:On Tue, Jan 6, 2015 at 7:06 PM, Aaron Botsis aa...@bt-r.com wrote:Hi Francisco, I’m aware, but still consider this to be a bug, or at least a great opportunity for an enhancement. :)Maybe, but you are going to have a problem.This had bitten me for the third time while trying to import some json data. It’d be great to bypass the copy escaping (and possibly other meta characters) when the column type is json or jsonb. I’d be happy to try and write it and submit a patch if folks believe this is an acceptable way to go… That said, I should probably read what the process is for this kind of thing :)Reading this, you are talking about 'the column being json'. COPY needs to do the escaping at the same time it's constructing the columns. The present way is easy to do, read char by char, if it's a escape, process next char acumulating into current field, otherwise see whether it is a field or record separator and act accordingly. It's also layered, when you construct the records for copy you get all the field data, turn them into escaped strings, join them by the field separator and spit them out followed by a record separator ( practical implementations may do this virtually ). Intermixing this with the 'I'm in a json column' would need to pass information from the upper layer, and make it more difficult and, specially error prone. What do you do if ( using the standard delimiters ) your json value has embeded newlines and tabs ( which, IIRC, are legal in several places inside the json ). And all this to make some incorrectly formatted files read ( which can be correctly formatted with a perl one liner or something similar ). I'm not the one to decide, but I will vote against including that ( but do not trust me too much, I would also vote against including 'csv' which I consider the root of many evils ).Francisco Olarte.


Re: [HACKERS] 9.5: Better memory accounting, towards memory-bounded HashAgg

2015-01-07 Thread Tomas Vondra
Hi,

On 23.12.2014 10:16, Jeff Davis wrote:
 It seems that these two patches are being reviewed together. Should
 I just combine them into one? My understanding was that some wanted
 to review the memory accounting patch separately.
 
 On Sun, 2014-12-21 at 20:19 +0100, Tomas Vondra wrote:
 That's the only conflict, and after fixing it it compiles OK.
 However, I got a segfault on the very first query I tried :-(
 
 If lookup_hash_entry doesn't find the group, and there's not enough 
 memory to create it, then it returns NULL; but the caller wasn't 
 checking for NULL. My apologies for such a trivial mistake, I was
 doing most of my testing using DISTINCT. My fix here was done
 quickly, so I'll take a closer look later to make sure I didn't miss
 something else.
 
 New patch attached (rebased, as well).

I did a review today, using these two patches:

* memory-accounting-v9.patch (submitted on December 2)
* hashagg-disk-20141222.patch

I started with some basic performance measurements comparing hashagg
queries without and with the patches (while you compared hashagg and
sort). That's IMHO an interesting comparison, especially when no
batching is necessary - in the optimal case the users should not see any
slowdown (we shouldn't make them pay for the batching unless it actually
is necessary).

So I did this:

drop table if exists test_hash_agg;

create table test_hash_agg as
select
i AS a,
mod(i,100) AS b,
mod(i,10) AS c,
mod(i,1) AS d,
mod(i,1000) AS e,
mod(i,100) AS f
from generate_series(1,1000) s(i);

vacuum (full, analyze) test_hash_agg;

i.e. a ~500MB table with 10M rows, and columns with different
cardinalities. And then queries like this:

   select count(*) from (select a, count(a) from test_hash_agg
 group by a) foo;

   -- 10M groups (OOM)
   select count(*) from (select a, array_agg(a) from test_hash_agg
 group by a) foo;

   -- 100 groups
   select count(*) from (select f, array_agg(f) from test_hash_agg
 group by f) foo;

which performed quite well, i.e. I've seen absolutely no slowdown.
Which, in the array_agg case, is quite is quite suspicious, because on
every lookup_hash_entry call, it has to do MemoryContextMemAllocated()
on 10M contexts, and I really doubt that can be done in ~0 time.

So I started digging in the code and I noticed this:

hash_mem = MemoryContextMemAllocated(aggstate-hashcontext, true);

which is IMHO obviously wrong, because that accounts only for the
hashtable itself. It might be correct for aggregates with state passed
by value, but it's incorrect for state passed by reference (e.g.
Numeric, arrays etc.), because initialize_aggregates does this:

oldContext = MemoryContextSwitchTo(aggstate-aggcontext);
pergroupstate-transValue = datumCopy(peraggstate-initValue,
peraggstate-transtypeByVal,
peraggstate-transtypeLen);
MemoryContextSwitchTo(oldContext);

and it's also wrong for all the user-defined aggretates that have no
access to hashcontext at all, and only get reference to aggcontext using
AggCheckCallContext(). array_agg() is a prime example.

In those cases the patch actually does no memory accounting and as
hashcontext has no child contexts, there's no accounting overhead.

After fixing this bug (replacing hashcontext with aggcontext in both
calls to MemoryContextMemAllocated) the performance drops dramatically.
For the query with 100 groups (not requiring any batching) I see this:

test=# explain analyze select count(x) from (select f, array_agg(1) AS x
from test_hash_agg group by f) foo;

QUERY PLAN (original patch)

 Aggregate  (cost=213695.57..213695.58 rows=1 width=32)
(actual time=2539.156..2539.156 rows=1 loops=1)
   -  HashAggregate  (cost=213693.07..213694.32 rows=100 width=4)
  (actual time=2492.264..2539.012 rows=100 loops=1)
 Group Key: test_hash_agg.f
 Batches: 1  Memory Usage: 24kB  Disk Usage:0kB
 -  Seq Scan on test_hash_agg
 (cost=0.00..163693.71 rows=871 width=4)
 (actual time=0.022..547.379 rows=1000 loops=1)
 Planning time: 0.039 ms
 Execution time: 2542.932 ms
(7 rows)

QUERY PLAN (fixed patch)

 Aggregate  (cost=213695.57..213695.58 rows=1 width=32)
(actual time=5670.885..5670.885 rows=1 loops=1)
   -  HashAggregate  (cost=213693.07..213694.32 rows=100 width=4)
  (actual time=5623.254..5670.803 rows=100 loops=1)
 Group Key: test_hash_agg.f
 Batches: 1  Memory Usage: 117642kB  Disk Usage:0kB
 -  Seq Scan on 

Re: [HACKERS] [PATCH] explain sortorder

2015-01-07 Thread Mike Blackwell
V6 of this patch applies, builds and checks against the current HEAD.  The
areas below could use some attention.

In explain.c:

  malloc() should not be called directly here.  palloc() would be the
correct call, I believe, but the functions in stringinfo.h are probably
your best choice as they remove the necessity for dealing with buffer size
and overflow.

  There is leftover commented out code from the previous patch version in
the T_Sort case.

  In show_sort_group_keys(), the splitting of the existing declaration and
initialization of the keyresno and target seems unnecessary and against the
style of surrounding code.

  Multi-line comments should follow the existing format.

There are no tests for the ... is LC_COLLATE and COLLATE... cases.

Section 14.1 of the documentation may need to be updated.


Mike.

__
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
mike.blackw...@rrd.com
http://www.rrdonnelley.com


http://www.rrdonnelley.com/
* mike.blackw...@rrd.com*

On Wed, Jan 7, 2015 at 10:17 AM, Timmer, Marius 
marius.tim...@uni-muenster.de wrote:

  Hi,

 we have spent the last days to realize your suggestions in the patch.
 It affects the result of a EXPLAIN-Statement (even in non-verbose-mode).
 Now you will get the order-information for every single sort-key which is
 not ordered by the defaults.


 best regards,

 Marius




 ---
 Marius Timmer
 Zentrum für Informationsverarbeitung
 Westfälische Wilhelms-Universität Münster
 Einsteinstraße 60

 mtimm...@uni-muenster.de



Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS

2015-01-07 Thread Stephen Frost
* Dean Rasheed (dean.a.rash...@gmail.com) wrote:
 On 6 January 2015 at 20:44, Peter Geoghegan p...@heroku.com wrote:
  Another issue that I see is that there is only one resultRelation
  between the INSERT and UPDATE. That means that without some additional
  special measure, both UPDATE and INSERT WITH CHECK ( ... )  options
  are applied regardless of whether the INSERT path was taken, or the
  UPDATE path. Maybe that's actually sensible (note that even this
  doesn't happen right now, since the auxiliary UPDATE is currently
  minimally processed by the rewriter). What do people think about that?
  It seems like it might be less surprising to have that fail earlier
  when there are separate policies for INSERT and UPDATE -- for UPSERT,
  all policies are applied regardless of which path is taken.
 
 I think the policies applied should depend on the path taken, so if it
 does an INSERT, then only the INSERT CHECK policy should be applied
 (after the insert), but if it ends up doing an UPDATE, I would expect
 the UPDATE USING policy to be applied (before the update) and the
 UPDATE CHECK policy to be applied (after the update). I would not
 expect the INSERT CHECK policy to be applied on the UPDATE path.

You're making a distinction where I'm not convinced there should be one.
While I understand that, internally, there are two paths (INSERT vs.
UPDATE), it's still only one command for the user and their goal is
simply update if this exists, insert if it doesn't.  I also don't like
the idea that the policy to be applied depends on if it ends up being an
INSERT or an UPDATE.  I liked Peter's suggestion that the row must pass
both WITH CHECK conditions- at least that's consistent and
understandable.

 As to whether the UPDATE USING policy should cause an error to be
 thrown if it is not satisfied, my inclination would be to not error,
 and use the command tag to report that no rows were updated, since
 that is what would happen with a regular UPDATE.

Yes, for a regular UPDATE that's what would happen- but this isn't a
regular UPDATE, it's an UPSERT.  Consider how users handle this now-
they have routines which continually try to do one or the other until
either the INSERT or the UPDATE is successful.  I've never seen such a
function, properly written, that says try to INSERT, if that fails, try
to UPDATE and if that doesn't update anything, then simply exit.  What
is the use-case for that?

 So overall INSERT .. ON CONFLICT UPDATE would be consistent with
 either an INSERT or an UPDATE, depending on whether the row existed
 beforehand, which is easier to explain than having some special UPSERT
 semantics.

Any semantics which result in a no-op would be surprising for users of
UPSERT.  I like the idea of failing earlier- if you try to INSERT .. ON
CONFLICT UPDATE a row which you're not allowed to INSERT, erroring
strikes me as the right result.  If you try to INSERT .. ON CONFLICT
UPDATE a row which you're not allowed to UPDATE, I'd also think an error
would be the right answer, even if you don't end up doing an actual
UPDATE- you ran a command asking for an UPDATE to happen under a certain
condition (the row already exists) and the policy states that you're not
allowed to do such an UPDATE.

As for the UPDATE's USING clause, if you're not allowed to view the
records to be updated (which evidently exists because the INSERT
failed), I'd handle that the same way we handle the case that a SELECT
policy might not allow a user to see rows that they can attempt to
INSERT- they'll get an error back in that case too.

In the end, these are edge cases as policies are generally self
consistent and so I think we have some leeway, but I don't think we have
the right to turn an INSERT .. ON CONFLICT UPDATE into a no-op.  That
would be like allowing an INSERT to be a no-op.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Re: [COMMITTERS] pgsql: Change how first WAL segment on new timeline after promotion is

2015-01-07 Thread Heikki Linnakangas

On 01/06/2015 03:22 PM, Andres Freund wrote:

On 2015-01-05 18:45:22 +0200, Heikki Linnakangas wrote:

On 01/03/2015 08:59 PM, Andres Freund wrote:

Did you perhaps intend to use XLogFileInit(use_existing = true)
instead of XLogFileOpen()? That works for me.


Hmm, that doesn't sound right either. XLogFileInit is used when you switch
to a new segment, not to open an old segment for writing. It happens to
work, because with use_existing = true it will in fact always open the old
segment, instead of creating a new one, but I don't think that's in the
spirit of how that function's intended to be used.


Well, its docs say Create a new XLOG file segment, or open a
pre-existing one., so it's not that mismatched. We really don't know
whether the EndOfLog's segment already exist in this scenario.  Also,
doesn't XLogWrite() essentially use it in the same way?


XLogWrite() is smarter. It uses XLogFileInit() when switching to a new 
segment, and XLogFileOpen when writing to the middle of a segment.


Committed the fix to not open the segment at all.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Turning recovery.conf into GUCs

2015-01-07 Thread Kevin Grittner
Andres Freund and...@2ndquadrant.com wrote:
 On 2015-01-06 17:08:20 -0800, Josh Berkus wrote:
 On 01/06/2015 04:42 PM, Andres Freund wrote:
 On 2015-01-06 16:33:36 -0800, Josh Berkus wrote:
 F. Inability to remaster without restarting the replica.

 That has pretty much nothing to do with the topic at hand.

 It has *everything* to do with the topic at hand.  The *only* reason we
 can't remaster without restarting is that recovery.conf is only read at
 startup time, and can't be reloaded.

 http://www.databasesoup.com/2014/05/remastering-without-restarting.html

 Not really. It's a good way to introduce suble and hard to understand
 corruption and other strange corner cases. Your replication connection
 was lost/reconnected in the wrong moment? Oops, received/replayed too
 much.

 A real solution to this requires more. You need to 1) disable writing
 any wal 2) force catchup of all possible standbys, including apply. Most
 importantly of the new master. This requires knowing which standbys
 exist. 3) promote new master. 4) only now allow reconnects.

I'm not following.  As long as each standby knows what point it is
at in the transaction stream, it could make a request to any
transaction source for transactions past that point.  If a node
which will be promoted to the new master isn't caught up to that
point, it should not send anything.  When it does get caught up it
could start sending transactions past that point, including the
timeline switch when it is promoted.

If you were arguing that some changes besides *just* allowing a
standby to read from a new source without a restart, OK -- some
changes might also be needed for a promoted master to behave as
described above; but certainly the ability to read WAL from a new
source on the floor would be a prerequisite, and possibly the
biggest hurdle to clear.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT UPDATE and RLS

2015-01-07 Thread Robert Haas
On Wed, Jan 7, 2015 at 4:04 AM, Dean Rasheed dean.a.rash...@gmail.com wrote:
 I think the policies applied should depend on the path taken, so if it
 does an INSERT, then only the INSERT CHECK policy should be applied
 (after the insert), but if it ends up doing an UPDATE, I would expect
 the UPDATE USING policy to be applied (before the update) and the
 UPDATE CHECK policy to be applied (after the update). I would not
 expect the INSERT CHECK policy to be applied on the UPDATE path.

I agree.

 As to whether the UPDATE USING policy should cause an error to be
 thrown if it is not satisfied, my inclination would be to not error,
 and use the command tag to report that no rows were updated, since
 that is what would happen with a regular UPDATE.

My inclination would be to error, but I'd be OK with your proposal.

 So overall INSERT .. ON CONFLICT UPDATE would be consistent with
 either an INSERT or an UPDATE, depending on whether the row existed
 beforehand, which is easier to explain than having some special UPSERT
 semantics.

Yeah.  We won't escape the question so easily where triggers are
concerned, but at least for RLS it seems like it should be possible to
avoid confusing, one-off semantics.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HINTing on UPDATE foo SET foo.bar = ..;

2015-01-07 Thread Alvaro Herrera
We're waiting for an updated version here, right?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HINTing on UPDATE foo SET foo.bar = ..;

2015-01-07 Thread Marko Tiikkaja

On 1/7/15 6:22 PM, Alvaro Herrera wrote:

We're waiting for an updated version here, right?


Yeah.  (The CF entry is also set to Waiting on Author, which seems 
appropriate.)



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers