date:20160727

Re: [HACKERS] A Modest Upgrade Proposal

2016-07-27 Thread David Fetter

On Sun, Jul 17, 2016 at 02:55:20PM -0400, Jan Wieck wrote:
> On Sun, Jul 17, 2016 at 2:08 PM, Jim Nasby  wrote:
> > On 7/13/16 2:06 PM, Joshua D. Drake wrote:
> >> On 07/07/2016 01:01 PM, Robert Haas wrote:
> >> There was an unconference session on this topic at PGCon and quite a
> >>> number of people there stated that they found DDL to be an ease-of-use
> >>> feature and wanted to have it.
> >>
> >> Yeah, I haven't meet anyone yet that would like to have:
> >>
> >> select replicate_these_relations('['public']);
> >>
> >> vs:
> >>
> >> ALTER SCHEMA public ENABLE REPLICATION;
> >>
> >> (or something like that).
> >>
> >
> > I generally agree, but I think the more important question is "Why?". Is
> > it becouse DDL looks more like a sentence? Is it because arrays are a PITA?
> > Is it too hard to call functions?
> 
> Once you get fine grained enough to support replicating different
> sets of possibly overlapping objects/namespaces to different groups
> of recipients, the DDL approach becomes just as convoluted as
> calling functions and nobody will memorize the entire syntax.

I don't see this as an actual problem.  I've written parts of the
SELECT syntax, but I haven't memorized even all of that.

DDL doesn't promise to be more complicated or easier to get wrong than
function calls, as far as I can tell.  The opposite could well be
true.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LWLocks in DSM memory

2016-07-27 Thread Thomas Munro

On Tue, Jul 26, 2016 at 6:00 AM, Andres Freund  wrote:
> I think the better fix here would acutally be to get rid of a pointer
> based list here, and a) replace the queue with integer offsets ...

Here is a prototype of that idea.  It replaces that dlist with a
proclist, a new specialised doubly-linked list for pgprocno values,
using INVALID_PGPROCNO for termination.  It works with proclist_node
objects inside PGPROC at an arbitrary offset which must be provided
when you initialise the proclist.

It could be made more general than just the PGPROC array by taking the
base address and stride of the array, perhaps even allowing for a
different base address in each backend for arrays that live in DSM,
but I happened to see https://xkcd.com/974/ today and didn't pursue
that thought.

> ... b) make
> the queue lock-free.  But I understand that you might not want to tackle
> that :P

This may be complicated by the not-strictly-queue-like access pattern.
I haven't looked into it yet, but I can see that it requires some
serious study (Harris or Zhang techniques, maybe with extra tagging to
avoid ABA problems on concurrent removal of the same node?, and if so
that might require a wider CAS than we have?).

I wonder if a proclist with an optional lock-free interface would also
be interesting for syncrep queues and others.

-- 
Thomas Munro
http://www.enterprisedb.com

lwlocks-in-dsm-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A Modest Upgrade Proposal

2016-07-27 Thread Michael Paquier

On Thu, Jul 28, 2016 at 10:22 AM, Craig Ringer  wrote:
> It might, actually. One approach for online upgrade is to:
>
> * pg_basebackup the master
> * start the replica and let it catch up
> * create a logical replication slot on the master
> * replace the replication.conf on the basebackup so it stops recovery at the
> lsn of the replication slot's confirmed_flush_lsn
> * stop the replica and pg_upgrade it
> * have the upgraded replica, now a master, replay from the old master over
> logical replication
> * once caught up, switch over
>
> This means a full dump and reload with a full rebuild of all indexes, etc,
> isn't needed. All shared catalog stuff is copied (until we switch to logical
> rep for the final catch-up).

This is a per-database logic to perform an upgrade of a single
database, right? If a cluster has multiple databases you need one
logical slot per database to complete an upgrade, which is where
sync_synchronous_names which is able to take now multiple entries
helps as well to ensure that the former master is in sync with the all
the logical slots in place.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A Modest Upgrade Proposal

2016-07-27 Thread Craig Ringer

On 28 July 2016 at 04:35, Bruce Momjian  wrote:

> On Fri, Jul  8, 2016 at 12:18:28AM +0100, Simon Riggs wrote:
> > On 7 July 2016 at 21:10, Robert Haas  wrote:
> >
> > pg_upgrade does that, kinda.  I'd like to have something better, but
> > in the absence of that, I think it's quite wrong to think about
> > deprecating it, even if we had logical replication fully integrated
> > into core today.  Which we by no means do.
> >
> > I don't see any problem with extending pg_upgrade to use logical
> replication
> > features under the covers.
> >
> > It seems very smooth to be able to just say
> >
> >pg_upgrade --online
> >
> > and then specify whatever other parameters that requires.
> >
> > It would be much easier to separate out that as a use-case so we can be
> sure we
> > get that in 10.0, even if nothing else lands.
>
> Uh, while "pg_upgrade --online" looks cool, I am not sure a solution
> based on logical replication would share _any_ code with the existing
> pg_upgrade tool, so it seems best to use another binary for this.
>

It might, actually. One approach for online upgrade is to:

* pg_basebackup the master
* start the replica and let it catch up
* create a logical replication slot on the master
* replace the replication.conf on the basebackup so it stops recovery at
the lsn of the replication slot's confirmed_flush_lsn
* stop the replica and pg_upgrade it
* have the upgraded replica, now a master, replay from the old master over
logical replication
* once caught up, switch over

This means a full dump and reload with a full rebuild of all indexes, etc,
isn't needed. All shared catalog stuff is copied (until we switch to
logical rep for the final catch-up).

I guess we could use the pg_dump/pg_restore pg_upgrade code to create
> the objects, and use logical replication to copy the rows, but what does
> this gain us that pg_dump/pg_restore doesn't?


A consistent switch-over point, where the upgrade can happen while the
master is still writing.

We create a slot, dump from the slot's exported snapshot, and switch over
to logical replication consistently at the end of the dump.

That's pretty much what BDR and pglogical do.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans

2016-07-27 Thread Kouhei Kaigai

> On 2016/07/27 13:51, Kouhei Kaigai wrote:
> > This output is, at least, not incorrect.
> > This ForeignScan actually scan a relation that consists of two joined
> > tables on the remote side. So, not incorrect, but may not convenient for
> > better understanding by users who don't have deep internal knowledge.
> 
> Hmm.
> 
> > On the other hands, I cannot be happy with your patch because it concludes
> > the role of ForeignScan/CustomScan with scanrelid==0 is always join.
> > However, it does not cover all the case.
> >
> > When postgres_fdw gets support of remote partial aggregate, we can implement
> > the feature using ForeignScan with scanrelid==0. Is it a join? No.
> 
> Yeah, I think that that could be implemented in both cases (scanrelid>0
> and scanrelid=0).
> 
> > Probably, the core cannot know exact purpose of ForeignScan/CustomScan with
> > scanrelid==0. It can be a remote partial aggregation. It can be an 
> > alternative
> > sort logic by GPU. It can be something others.
> > So, I think extension needs to inform the core more proper label to show;
> > including a flag to control whether the relation(s) name shall be attached
> > next to the ForeignScan/CustomScan line.
> >
> > I'd like you to suggest to add two fields below:
> >  - An alternative label extension wants to show instead of the default one
> 
> How about adding something like the "Additional Operations" line as
> proposed in a previous email, instead?  As you know, FDWs/Extensions
> could add such information through ExplainForeignScan/ExplainCustomScan.
>
No. What I'm saying is here:


EXPLAIN (COSTS false, VERBOSE)
SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY  
t1.c3, t1.c1 OFFSET 100 LIMIT 10;
  QUERY PLAN   
---
Limit
  Output: t1.c1, t2.c1, t1.c3
  ->  Foreign Scan
  
Output: t1.c1, t2.c1, t1.c3
Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
Remote SQL: SELECT r1."C 1", r1.c3, r2."C 1" FROM ("S 1"."T  
1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY  
r1.c3 ASC N\
ULLS LAST, r1."C 1" ASC NULLS LAST
(6 rows)


Your concern is that EXPLAIN shows ForeignScan node as "Foreign Scan"
although it actually works as "Foreign Join".

My suggestion makes EXPLAIN print "Foreign %s" according to the label 
assigned by the extension. Only FDW driver knows how this ForeignScan
node actually performs on, thus, only FDW driver can set meaningful label.

This label may be "Join", may be "Partial Aggregate + Join", or something
others according to the actual job of this node.

> >  - A flag to turn on/off printing relation(s) name
> 
> ISTM that the relation information should be always printed even in the
> case of scanrelid=0 by the core, because that would be essential for
> analyzing EXPLAIN results.
>
We have no information which relations are actually scanned by ForeignScan
and CustomScan. Your patch abuses fs_relids and custom_relids, however,
these fields don't indicate the relations actually scan by these nodes.
It just tracks relations processed by this node or its children.

See the example below. This GpuJoin on CustomScan takes three SeqScan nodes
as inner/outer source of relations joins. GpuJoin never scans the tables
actually. It picks up the records generated by underlying SeqScan nodes.


postgres=# explain select id from t0 natural join t1 natural join t2;
QUERY PLAN
---
 Custom Scan (GpuJoin)  (cost=12385.67..291245.35 rows=9815465 width=4)
   GPU Projection: t0.id
   Depth 1: GpuHashJoin, HashKeys: (t0.bid)
JoinQuals: (t0.bid = t2.bid)
Nrows (in/out: 98.15%), KDS-Hash (size: 13.47MB, nbatches: 1)
   Depth 2: GpuHashJoin, HashKeys: (t0.aid)
JoinQuals: (t0.aid = t1.aid)
Nrows (in/out: 100.00%), KDS-Hash (size: 13.47MB, nbatches: 1)
   ->  Seq Scan on t0  (cost=0.00..18.96 rows=996 width=12)
   ->  Seq Scan on t2  (cost=0.00..1935.00 rows=10 width=4)
   ->  Seq Scan on t1  (cost=0.00..1935.00 rows=10 width=4)
(11 rows)


If EXPLAIN command attaches something like "on t0,t1,t2" after the GpuJoin
*with no choice*, it is problematic and misleading.
It shall be controllable by the extension which knows what tables are
actually scanned. (Please note that I never says extension makes the string.
It is a job of core explain. I suggest it needs to be controllable.)


Even though I recommended a boolean flag to turn on/off this print, it may
need to be a bitmap to indicate which underlying relations should be printed
by the explain command. Because we can easily assume ForeignScan/CustomScan
node that scans only a part of child tables. For example, it is available to
implement GpuJoin scans the t1 and t2 by itself but takes SeqScan on t0.

> >  - A flag to turn on/off printing

Re: [HACKERS] old_snapshot_threshold allows heap:toast disagreement

2016-07-27 Thread Andres Freund

Hi,

On 2016-07-26 15:13:41 -0400, Robert Haas wrote:
> On Thu, Jul 21, 2016 at 12:06 PM, Robert Haas  wrote:
> > On Wed, Jul 20, 2016 at 9:15 PM, Noah Misch  wrote:
> >> This PostgreSQL 9.6 open item now needs a permanent owner.  Would any other
> >> committer like to take ownership?  If this role interests you, please read
> >> this thread and the policy linked above, then send an initial status update
> >> bearing a date for your subsequent status update.  If the item does not 
> >> have a
> >> permanent owner by 2016-07-24 02:00 UTC, I will resolve the item by 
> >> reverting
> >> commit 848ef42 and followups.
> >
> > I will adopt this item.  I will provide an initial patch for this
> > issue, or convince someone else to do so, within one week.  Therefore,
> > expect a further status update from me on or before July 28th.  I
> > expect that the patch will be based on ideas from these emails:
> >
> > https://www.postgresql.org/message-id/1ab8f80a-d16e-4154-9497-98fbb1642...@anarazel.de
> > https://www.postgresql.org/message-id/20160720181213.f4io7gc6lyc37...@alap3.anarazel.de
> 
> Here is a patch.

Cool!


Why did you decide to introduce MaximumXLogRecPtr? Wouldn't just using
InvalidXLogRecPtr do the trick? That already prevents errors.


> Please review.  I'm not suffering from an overwhelming excess of
> confidence that this is correct.  In particular, I'm concerned about
> the fact that there isn't always a registered snapshot - if you make
> it ERROR out when that happens instead of doing what it actually does,
> the regression tests fail in CLUSTER and CREATE VIEW.

Right.


> Also, I wonder why it's right to use
> pairingheap_first() instead of looking at the oldest snapshot on the
> active snapshot stack, or conversely why that is right and this is
> not.  Or maybe we need to check both.

Well, generally, the older the snapshot we use is, the more we'll error
out spuriously. The reason to use the oldest from the pairing heap is
that that'll be the most conservative value, right?  If there's neither
an active, nor a registered snapshot, we'll not prevent pruning in the
first place (and xmin ought to be invalid).  As registered snapshots are
usually "older" than active ones, we definitely have to check those. But
you're right, it seems safer to also check active ones - which squares
with SnapshotResetXmin().


> The basic principle of testing both MVCC and TOAST snapshots for
> snapshot-too-old problems seems sound.  Furthermore, it seems clear
> enough that if we're ever looking up a datum in the TOAST table whose
> xmin is no longer included in our MyPgXact->xmin, then we're
> vulnerable to having TOAST chunks vacuumed away under us even without
> snapshot_too_old enabled.

Exactly.


> But there's a bit of spooky action at a
> distance: if we don't see any snapshots, then we have to assume that
> the scan is being performed with a non-MVCC snapshot and therefore we
> don't need to worry.  But there's no way to verify that via an
> assertion, because the connection between the relevant MVCC snapshot
> and the corresponding TOAST snapshot is pretty indirect, mediated only
> by snapmgr.c.

Hm. Could we perhaps assert that the session has a valid xmin?


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixing target-column indirection in INSERT with multiple VALUES

2016-07-27 Thread Tom Lane

Kevin Grittner  writes:
> On Wed, Jul 27, 2016 at 1:47 PM, Tom Lane  wrote:
>> this would require an initdb because it changes the
>> representation of stored rules for cases like this,

> So pg_upgrade would not work at all for the version this goes into,

No, pg_upgrade wouldn't have a problem.  The entire point of pg_upgrade
is to cope with system catalog changes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixing target-column indirection in INSERT with multiple VALUES

2016-07-27 Thread Kevin Grittner

On Wed, Jul 27, 2016 at 1:47 PM, Tom Lane  wrote:

> I looked into the problem reported in bug #14265,

> attached is a proposed patch that fixes this bug

> this would require an initdb because it changes the
> representation of stored rules for cases like this,

So pg_upgrade would not work at all for the version this goes into,
or do you see us testing for such cases and allowing pg_upgrade if
none are found?  Or do you see a way to allow pg_upgrade in the
face of such stored rules?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we lost Uber as a user

2016-07-27 Thread Josh Berkus

On 07/26/2016 08:45 PM, Robert Haas wrote:
> That's why I found Josh's restatement useful - I am assuming without
> proof that his restatement is accurate

FWIW, my restatement was based on some other sites rather than Uber.
Including folks who didn't abandon Postgres.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A Modest Upgrade Proposal

2016-07-27 Thread Bruce Momjian

On Fri, Jul  8, 2016 at 12:18:28AM +0100, Simon Riggs wrote:
> On 7 July 2016 at 21:10, Robert Haas  wrote:
> 
> pg_upgrade does that, kinda.  I'd like to have something better, but
> in the absence of that, I think it's quite wrong to think about
> deprecating it, even if we had logical replication fully integrated
> into core today.  Which we by no means do.
> 
> I don't see any problem with extending pg_upgrade to use logical replication
> features under the covers.
> 
> It seems very smooth to be able to just say
> 
>    pg_upgrade --online 
> 
> and then specify whatever other parameters that requires.
> 
> It would be much easier to separate out that as a use-case so we can be sure 
> we
> get that in 10.0, even if nothing else lands.

Uh, while "pg_upgrade --online" looks cool, I am not sure a solution
based on logical replication would share _any_ code with the existing
pg_upgrade tool, so it seems best to use another binary for this.  

I guess we could use the pg_dump/pg_restore pg_upgrade code to create
the objects, and use logical replication to copy the rows, but what does
this gain us that pg_dump/pg_restore doesn't?  Wouldn't you just create
the standby using logical replication and just switch-over?  Why use
pg_upgrade at all?  Am I missing something?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rethinking TupleTableSlot deforming

2016-07-27 Thread Alvaro Herrera

Andres Freund wrote:
> Hi,
> 
> I've previously mentioned (e.g. [1]) that tuple deforming is a serious
> bottlneck. I've also experimented successfully [2] making
> slot_deform_tuple() faster.

I'm currently messing with making heapam.c be just one possible
implementation of tuple storage, to allow other modules to implement
tuple storage in different ways.  Each such module would have a pg_am
row, and among the set of routines that such a module would have to
provide is the equivalent of slot_deform_tuple, which takes a tuple in
whatever format the storage module uses and puts it into
executor-formatted tts_values/tts_isnull arrays.  For tuples coming from
heapam.c-formatted tables that would be pretty much slot_deform_tuple,
but of course other modules could have tuples in completely different
formats.

(This is one part in the larger project of columnar storage, where a
storage module would store tuples in some columnar format rather than
row-oriented.  Obviously, tuple deforming means a completely different
thing in that world than in heapam.c).

No patch to show at present; I mentioned this project in Brussels.
I intend to present for 10.0.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fixing target-column indirection in INSERT with multiple VALUES

2016-07-27 Thread Tom Lane

I looked into the problem reported in bug #14265,
https://www.postgresql.org/message-id/20160727005725.7438.26...@wrigleys.postgresql.org

The core of the problem is a design decision that seems pretty bad in
hindsight: if we have array-subscripting or field-selection decoration
in the target column list of an INSERT with multiple VALUES rows, then
we apply transformAssignedExpr() to each element of each VALUES row
independently.  So for example in
INSERT INTO foo (col[1]) VALUES (1), (2), (3)
there are going to be three identical ArrayRef nodes (with different
input expressions) inside the VALUES RTE, and then just a Var referencing
the VALUES RTE in the top-level targetlist.  While that works, it's not
so good when we have
INSERT INTO foo (col[1], col[2]) VALUES (1, 2), (3, 4)
because the rewriter needs to merge the assignments to "col" by nesting
the ArrayRefs together, but it fails to find the ArrayRefs in the
top-level targetlist, leading to the complained-of error.

Other reasons not to like this design are the space taken by the redundant
copies of the ArrayRef nodes, and the ugly logic needed in ruleutils.c
to deal with this situation, which is unlike either the INSERT-with-
single-VALUES-row or the INSERT-SELECT cases.

So attached is a proposed patch that fixes this bug by moving the ArrayRef
and/or FieldStore nodes associated with the column target list into the
top-level targetlist.  I still have it calling transformAssignedExpr
for each VALUES expression, which means that the parser still generates
(and then throws away) ArrayRef/FieldStore nodes for each VALUES row when
there is indirection decoration in the target list.  It would probably be
possible to avoid that, but it would take very invasive refactoring of
transformAssignmentIndirection and allied routines, and I'm dubious that
the case comes up often enough to be worth complicating that code even
more.

Another issue is that this would require an initdb because it changes the
representation of stored rules for cases like this, which means we could
only fix it in HEAD and there wouldn't be a back-patch.  Given that it's
been like this forever and we've not had complaints before, I think that's
acceptable.

Conceivably we could fix this without initdb-forcing changes in the
parser's output, if we taught the rewriter to apply rewriteTargetListIU
to each row of the VALUES RTE rather than to the top-level targetlist.
But that would be a complex mess for multiple reasons --- for instance,
it would change the output column set for the RTE.  So I'm not at all
excited about pursuing that path.

In short, I propose applying the attached to HEAD but not attempting
to fix the issue in the back branches.

regards, tom lane

diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 29c8c4e..eac86cc 100644
*** a/src/backend/parser/analyze.c
--- b/src/backend/parser/analyze.c
*** post_parse_analyze_hook_type post_parse_
*** 51,57 
  static Query *transformDeleteStmt(ParseState *pstate, DeleteStmt *stmt);
  static Query *transformInsertStmt(ParseState *pstate, InsertStmt *stmt);
  static List *transformInsertRow(ParseState *pstate, List *exprlist,
!    List *stmtcols, List *icolumns, List *attrnos);
  static OnConflictExpr *transformOnConflictClause(ParseState *pstate,
  		  OnConflictClause *onConflictClause);
  static int	count_rowexpr_columns(ParseState *pstate, Node *expr);
--- 51,58 
  static Query *transformDeleteStmt(ParseState *pstate, DeleteStmt *stmt);
  static Query *transformInsertStmt(ParseState *pstate, InsertStmt *stmt);
  static List *transformInsertRow(ParseState *pstate, List *exprlist,
!    List *stmtcols, List *icolumns, List *attrnos,
!    bool strip_indirection);
  static OnConflictExpr *transformOnConflictClause(ParseState *pstate,
  		  OnConflictClause *onConflictClause);
  static int	count_rowexpr_columns(ParseState *pstate, Node *expr);
*** transformInsertStmt(ParseState *pstate, 
*** 619,625 
  		/* Prepare row for assignment to target table */
  		exprList = transformInsertRow(pstate, exprList,
  	  stmt->cols,
! 	  icolumns, attrnos);
  	}
  	else if (list_length(selectStmt->valuesLists) > 1)
  	{
--- 620,627 
  		/* Prepare row for assignment to target table */
  		exprList = transformInsertRow(pstate, exprList,
  	  stmt->cols,
! 	  icolumns, attrnos,
! 	  false);
  	}
  	else if (list_length(selectStmt->valuesLists) > 1)
  	{
*** transformInsertStmt(ParseState *pstate, 
*** 663,672 
  			exprLocation((Node *) sublist;
  			}
  
! 			/* Prepare row for assignment to target table */
  			sublist = transformInsertRow(pstate, sublist,
  		 stmt->cols,
! 		 icolumns, attrnos);
  
  			/*
  			 * We must assign collations now because assign_query_collations
--- 665,684 
  			exprLocation((Node *) sublist;

Re: [HACKERS] Macro customizable hashtable / bitmapscan & aggregation perf

2016-07-27 Thread Andres Freund

On 2016-07-27 10:04:52 -0400, Robert Haas wrote:
> On Tue, Jul 26, 2016 at 8:43 PM, Andres Freund  wrote:
> > As previously mentioned, dynahash isn't particularly fast. In many cases
> > that doesn't particularly matter. But e.g. hash aggregation and bitmap
> > index scans are quite sensitive to hash performance.
> >
> > The biggest problems with dynahash (also discussed at [1]) are
> >
> > a) two level structure doubling the amount of indirect lookups
> > b) indirect function calls
> > c) using separate chaining based conflict resolution
> > d) being too general.
> 
> I am ... skeptical about open addressing.  It's less unappealing for
> this application than in general because we don't actually need to
> delete anything, but that wouldn't be true for the catcache.

I don't think deletions really change the picture for open addressing -
you don't need toombstones, you can instead move elements closer to
their origin at deletion. I just hadn't implemented that yet, because it
didn't seem like a crucial point.

Unfortunately open addressing, particularly linear one, has such vastly
superiour cache access patterns, that it's hard to come close with a
chained hash table. I've previously tried a chained hash table, and the
performance gains were a lot smaller.  For that reason many hash-table
implementations used in performance critical paths appear to be open
addressing.

Don't get me wrong, I do certainly think that there's good cases for
using separate chaining in places where performance is less of a
concern; and possibly when you want lock-free concurrency.

> All the same, I feel that we need to assess the risk that we're
> improving average-case performance while creating large regressions in
> other cases (i.e. where there is clustering).

The actual algorithmic worst-case complexity isn't different. It's
obviously important to resize appropriately. It's not that hard to beat
dynahash's memory efficiency, so fillfactors don't have to be kept super
high to not regress in memory usage.

> Do likely() and unlikely() actually buy us anything here?

It's only a few percent here, but overall, especially when used around
error checks, it's above 10%. The reason I used it here is primary that
it made the assembly easier to read for me... ;)

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH v12] GSSAPI encryption support

2016-07-27 Thread Robbie Harwood

Michael Paquier  writes:

> On Wed, Jul 27, 2016 at 12:22 AM, Robbie Harwood  wrote:
>> Michael Paquier  writes:
>>
>> So there's a connection setting `sslmode` that we'll want something
>> similar to here (`gssapimode` or so).  `sslmode` has six settings, but I
>> think we only need three for GSSAPI: "disable", "allow", and "prefer"
>> (which presumably would be the default).
>
> Seeing the debate regarding sslmode these days, I would not say that
> "prefer" would be the default, but that's an implementation detail.
>
>> Lets suppose we're working with "prefer".  GSSAPI will itself check two
>> places for credentials: client keytab and ccache.  But if we don't find
>> credentials there, we as the client have two options on how to proceed.
>>
>> - First, we could prompt for a password (and then call
>>   gss_acquire_cred_with_password() to get credentials), presumably with
>>   an empty password meaning to skip GSSAPI.  My memory is that the
>>   current behavior for GSSAPI auth-only is to prompt for password if we
>>   don't find credentials (and if it isn't, there's no reason not to
>>   unless we're opposed to handling the password).
>>
>> - Second, we could skip GSSAPI and proceed with the next connection
>>   method.  This might be confusing if the user is then prompted for a
>>   password and expects it to be for GSSAPI, but we could probably make
>>   it sensible.  I think I prefer the first option.
>
> Ah, right. I completely forgot that GSSAPI had its own handling of
> passwords for users registered to it...
>
> Isn't this distinction a good point for not implementing "prefer",
> "allow" or any equivalents? By that I mean that we should not have any
> GSS connection mode that fallbacks to something else if the first one
> fails. So we would live with the two following modes:
> - "disable", to only try a non-GSS connection
> - "enable", or "require", to only try a GSS connection.
> That seems quite acceptable to me as a first implementation to just
> have that.

If it is the password management that is scary here, we could have a
prefer-type mode which does not prompt, but only uses existing
credentials.  Or we could opt to never prompt, which is totally valid.


signature.asc
Description: PGP signature

Re: [HACKERS] Why we lost Uber as a user

2016-07-27 Thread Bruce Momjian

On Wed, Jul 27, 2016 at 08:33:52AM -0500, Merlin Moncure wrote:
> > Or in short, this seems like an annoyance, not a time-for-a-new-database
> > kind of problem.
> 
> Well, the real annoyance as I understand it is the raw volume of bytes
> of WAL traffic a single update of a field can cause.  They switched to
> statement level replication(!).

Well, their big complaint about binary replication is that a bug can
spread from a master to all slaves, which doesn't happen with statement
level replication.  If that type of corruption is your primary worry,
and you can ignore the worries about statement level replication, then
it makes sense.  Of course, the big tragedy is that statement level
replication has known unfixable(?) failures, while binary replication
failures are caused by developer-introduced bugs.

In some ways, people worry about the bugs they have seen, not the bugs
they haven't seen.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE

2016-07-27 Thread Peter van Hardenberg

On Tue, Jul 26, 2016 at 6:15 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 7/26/16 6:14 PM, Vik Fearing wrote:
> > As mentioned elsewhere in the thread, you can just do WHERE true to get
> > around it, so why on Earth have it PGC_SUSET?
>
> I'm not sure whether it's supposed to guard against typos and possibly
> buggy SQL string concatenation in application code.  So it would help
> against accidental mistakes, whereas putting a WHERE TRUE in there would
> be an intentional override.
>
>
I know I'm late to the thread here, but I just wanted to add my small voice
in support
of this feature.

Over the years we've seen this happen at Heroku quite a bit (accidental
manual entry
without a where clause) and the only minor gripe I'd have is that contrib
modules are
very undiscoverable and users tend not to find out about them.

On the other hand, a session setting in core would probably not be that
different.

I expect Heroku will probably wind up enabling this by default on any
interactive
psql sessions.

(And I would encourage packagers and distributors to consider doing the
same.)

-p

Re: [HACKERS] [Patch] RBTree iteration interface improvement

2016-07-27 Thread Aleksander Alekseev

And as always, I didn't re-check a patch before sending it :) Here is a
corrected version without any fprintf's.

-- 
Best regards,
Aleksander Alekseev
diff --git a/src/backend/access/gin/ginbulk.c b/src/backend/access/gin/ginbulk.c
index d6422ea..eac76c4 100644
--- a/src/backend/access/gin/ginbulk.c
+++ b/src/backend/access/gin/ginbulk.c
@@ -255,7 +255,7 @@ qsortCompareItemPointers(const void *a, const void *b)
 void
 ginBeginBAScan(BuildAccumulator *accum)
 {
-	rb_begin_iterate(accum->tree, LeftRightWalk);
+	rb_begin_left_right_walk(accum->tree, >tree_walk);
 }
 
 /*
@@ -271,7 +271,7 @@ ginGetBAEntry(BuildAccumulator *accum,
 	GinEntryAccumulator *entry;
 	ItemPointerData *list;
 
-	entry = (GinEntryAccumulator *) rb_iterate(accum->tree);
+	entry = (GinEntryAccumulator *) rb_left_right_walk(>tree_walk);
 
 	if (entry == NULL)
 		return NULL;			/* no more entries */
diff --git a/src/backend/lib/rbtree.c b/src/backend/lib/rbtree.c
index 4fa8a1d..eb645a2 100644
--- a/src/backend/lib/rbtree.c
+++ b/src/backend/lib/rbtree.c
@@ -871,3 +871,278 @@ rb_iterate(RBTree *rb)
 
 	return rb->iterate(rb);
 }
+
+/*
+ * Begin left right walk.
+ */
+void
+rb_begin_left_right_walk(RBTree *rb, RBTreeLeftRightWalk * lrw)
+{
+	lrw->rb = rb;
+	lrw->last_visited = NULL;
+	lrw->is_over = (lrw->rb->root == RBNIL);
+}
+
+/*
+ * Left right walk: get next node. Returns NULL if there is none.
+ */
+RBNode *
+rb_left_right_walk(RBTreeLeftRightWalk * lrw)
+{
+	if (lrw->is_over)
+		return NULL;
+
+	if (lrw->last_visited == NULL)
+	{
+		lrw->last_visited = lrw->rb->root;
+		while (lrw->last_visited->left != RBNIL)
+			lrw->last_visited = lrw->last_visited->left;
+
+		return lrw->last_visited;
+	}
+
+	if (lrw->last_visited->right != RBNIL)
+	{
+		lrw->last_visited = lrw->last_visited->right;
+		while (lrw->last_visited->left != RBNIL)
+			lrw->last_visited = lrw->last_visited->left;
+
+		return lrw->last_visited;
+	}
+
+	for (;;)
+	{
+		RBNode	   *came_from = lrw->last_visited;
+
+		lrw->last_visited = lrw->last_visited->parent;
+		if (lrw->last_visited == NULL)
+		{
+			lrw->is_over = true;
+			break;
+		}
+
+		if (lrw->last_visited->left == came_from)
+			break;/* came from left sub-tree, return current
+ * node */
+
+		/* else - came from right sub-tree, continue to move up */
+	}
+
+	return lrw->last_visited;
+}
+
+/*
+ * Begin right left walk.
+ */
+void
+rb_begin_right_left_walk(RBTree *rb, RBTreeRightLeftWalk * rlw)
+{
+	rlw->rb = rb;
+	rlw->last_visited = NULL;
+	rlw->is_over = (rlw->rb->root == RBNIL);
+}
+
+/*
+ * Right left walk: get next node. Returns NULL if there is none.
+ */
+RBNode *
+rb_right_left_walk(RBTreeRightLeftWalk * rlw)
+{
+	if (rlw->is_over)
+		return NULL;
+
+	if (rlw->last_visited == NULL)
+	{
+		rlw->last_visited = rlw->rb->root;
+		while (rlw->last_visited->right != RBNIL)
+			rlw->last_visited = rlw->last_visited->right;
+
+		return rlw->last_visited;
+	}
+
+	if (rlw->last_visited->left != RBNIL)
+	{
+		rlw->last_visited = rlw->last_visited->left;
+		while (rlw->last_visited->right != RBNIL)
+			rlw->last_visited = rlw->last_visited->right;
+
+		return rlw->last_visited;
+	}
+
+	for (;;)
+	{
+		RBNode	   *came_from = rlw->last_visited;
+
+		rlw->last_visited = rlw->last_visited->parent;
+		if (rlw->last_visited == NULL)
+		{
+			rlw->is_over = true;
+			break;
+		}
+
+		if (rlw->last_visited->right == came_from)
+			break;/* came from right sub-tree, return current
+ * node */
+
+		/* else - came from left sub-tree, continue to move up */
+	}
+
+	return rlw->last_visited;
+}
+
+/*
+ * Begin direct walk (a.k.a pre-order traversal).
+ *
+ * Pre-order traversal while duplicating nodes and edges can make a complete
+ * duplicate of a binary tree. It can also be used to make a prefix expression
+ * (Polish notation) from expression trees: traverse the expression tree
+ * pre-orderly.
+ *
+ * https://en.wikipedia.org/wiki/Tree_traversal#Applications
+ */
+void
+rb_begin_direct_walk(RBTree *rb, RBTreeDirectWalk * dw)
+{
+	dw->rb = rb;
+	dw->last_visited = NULL;
+	dw->is_over = (dw->rb->root == RBNIL);
+}
+
+/*
+ * Direct walk: get next node. Returns NULL if there is none.
+ */
+RBNode *
+rb_direct_walk(RBTreeDirectWalk * dw)
+{
+	if (dw->is_over)
+		return NULL;
+
+	if (dw->last_visited == NULL)
+	{
+		dw->last_visited = dw->rb->root;
+		return dw->last_visited;
+	}
+
+	if (dw->last_visited->left != RBNIL)
+	{
+		dw->last_visited = dw->last_visited->left;
+		return dw->last_visited;
+	}
+
+	do
+	{
+		if (dw->last_visited->right != RBNIL)
+		{
+			dw->last_visited = dw->last_visited->right;
+			break;
+		}
+
+		/* go up and one step right */
+		for (;;)
+		{
+			RBNode	   *came_from = dw->last_visited;
+
+			dw->last_visited = dw->last_visited->parent;
+			if (dw->last_visited == NULL)
+			{
+dw->is_over = true;
+break;
+			}
+
+			if ((dw->last_visited->right != came_from) && (dw->last_visited->right != RBNIL))
+			{
+dw->last_visited = dw->last_visited->right;
+return

Re: [HACKERS] Curing plpgsql's memory leaks for statement-lifespan values

2016-07-27 Thread Tom Lane

Robert Haas  writes:
> On Mon, Jul 25, 2016 at 6:04 PM, Tom Lane  wrote:
>> I suppose that a fix based on putting PG_TRY blocks into all the affected
>> functions might be simple enough that we'd risk back-patching it, but
>> I don't really want to go that way.

> try/catch blocks aren't completely free, either, and PL/pgsql is not
> suffering from a deplorable excess of execution speed.

BTW, just to annotate that a bit: I did some measurements and found out
that on my Linux box, creating/deleting a memory context
(AllocSetContextCreate + MemoryContextDelete) is somewhere around 10x
more expensive than a PG_TRY block.  This means that the PG_TRY approach
would actually be faster for cases involving only a small number of
statements-needing-local-storage within a single plpgsql function
execution.  However, the memory context creation cost is amortized across
repeated executions of a statement, whereas of course PG_TRY won't be.
We can roughly estimate that PG_TRY would lose any time we loop through
the statement in question more than circa ten times.  So I believe the
way I want to do it will win speed-wise in cases where it matters, but
it's not entirely an open-and-shut conclusion.

Anyway, there are enough other reasons not to go the PG_TRY route.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [Patch] RBTree iteration interface improvement

2016-07-27 Thread Aleksander Alekseev

Hello

While working on some new feature for PostgreSQL (which is still in
development and is irrelevant in this context) I noticed that current
RBTree implementation interface is following:

```
extern void rb_begin_iterate(RBTree *rb, RBOrderControl ctrl);
extern RBNode *rb_iterate(RBTree *rb);
```

As you see it doesn't allow to do multiple iterations over a single
tree. I believe it's a major flaw for a general-purpose container.

Proposed patch introduces a new iteration interface that doesn't has
such a flaw. Naturally I wrote some unit tests, but I was not sure where
exactly to place them in PostgreSQL source code. For now I've just
uploaded them to GitHub:

https://github.com/afiskon/c-algorithms-examples/blob/master/rbtree_example.c

According to these tests new implementation works as fast as current
implementation and produces exactly same results.

I didn't dare to remove current interface since in theory some
extensions can use it. Still I believe such a move is worth considering.

-- 
Best regards,
Aleksander Alekseev
diff --git a/src/backend/access/gin/ginbulk.c b/src/backend/access/gin/ginbulk.c
index d6422ea..eac76c4 100644
--- a/src/backend/access/gin/ginbulk.c
+++ b/src/backend/access/gin/ginbulk.c
@@ -255,7 +255,7 @@ qsortCompareItemPointers(const void *a, const void *b)
 void
 ginBeginBAScan(BuildAccumulator *accum)
 {
-	rb_begin_iterate(accum->tree, LeftRightWalk);
+	rb_begin_left_right_walk(accum->tree, >tree_walk);
 }
 
 /*
@@ -271,7 +271,7 @@ ginGetBAEntry(BuildAccumulator *accum,
 	GinEntryAccumulator *entry;
 	ItemPointerData *list;
 
-	entry = (GinEntryAccumulator *) rb_iterate(accum->tree);
+	entry = (GinEntryAccumulator *) rb_left_right_walk(>tree_walk);
 
 	if (entry == NULL)
 		return NULL;			/* no more entries */
diff --git a/src/backend/lib/rbtree.c b/src/backend/lib/rbtree.c
index 4fa8a1d..cbe256f 100644
--- a/src/backend/lib/rbtree.c
+++ b/src/backend/lib/rbtree.c
@@ -871,3 +871,278 @@ rb_iterate(RBTree *rb)
 
 	return rb->iterate(rb);
 }
+
+/*
+ * Begin left right walk.
+ */
+void
+rb_begin_left_right_walk(RBTree *rb, RBTreeLeftRightWalk * lrw)
+{
+	lrw->rb = rb;
+	lrw->last_visited = NULL;
+	lrw->is_over = (lrw->rb->root == RBNIL);
+}
+
+/*
+ * Left right walk: get next node. Returns NULL if there is none.
+ */
+RBNode *
+rb_left_right_walk(RBTreeLeftRightWalk * lrw)
+{
+	if (lrw->is_over)
+		return NULL;
+
+	if (lrw->last_visited == NULL)
+	{
+		lrw->last_visited = lrw->rb->root;
+		while (lrw->last_visited->left != RBNIL)
+			lrw->last_visited = lrw->last_visited->left;
+
+		return lrw->last_visited;
+	}
+
+	if (lrw->last_visited->right != RBNIL)
+	{
+		lrw->last_visited = lrw->last_visited->right;
+		while (lrw->last_visited->left != RBNIL)
+			lrw->last_visited = lrw->last_visited->left;
+
+		return lrw->last_visited;
+	}
+
+	for (;;)
+	{
+		RBNode	   *came_from = lrw->last_visited;
+
+		lrw->last_visited = lrw->last_visited->parent;
+		if (lrw->last_visited == NULL)
+		{
+			lrw->is_over = true;
+			break;
+		}
+
+		if (lrw->last_visited->left == came_from)
+			break;/* came from left sub-tree, return current
+ * node */
+
+		/* else - came from right sub-tree, continue to move up */
+	}
+
+	return lrw->last_visited;
+}
+
+/*
+ * Begin right left walk.
+ */
+void
+rb_begin_right_left_walk(RBTree *rb, RBTreeRightLeftWalk * rlw)
+{
+	rlw->rb = rb;
+	rlw->last_visited = NULL;
+	rlw->is_over = (rlw->rb->root == RBNIL);
+}
+
+/*
+ * Right left walk: get next node. Returns NULL if there is none.
+ */
+RBNode *
+rb_right_left_walk(RBTreeRightLeftWalk * rlw)
+{
+	if (rlw->is_over)
+		return NULL;
+
+	if (rlw->last_visited == NULL)
+	{
+		rlw->last_visited = rlw->rb->root;
+		while (rlw->last_visited->right != RBNIL)
+			rlw->last_visited = rlw->last_visited->right;
+
+		return rlw->last_visited;
+	}
+
+	if (rlw->last_visited->left != RBNIL)
+	{
+		rlw->last_visited = rlw->last_visited->left;
+		while (rlw->last_visited->right != RBNIL)
+			rlw->last_visited = rlw->last_visited->right;
+
+		return rlw->last_visited;
+	}
+
+	for (;;)
+	{
+		RBNode	   *came_from = rlw->last_visited;
+
+		rlw->last_visited = rlw->last_visited->parent;
+		if (rlw->last_visited == NULL)
+		{
+			rlw->is_over = true;
+			break;
+		}
+
+		if (rlw->last_visited->right == came_from)
+			break;/* came from right sub-tree, return current
+ * node */
+
+		/* else - came from left sub-tree, continue to move up */
+	}
+
+	return rlw->last_visited;
+}
+
+/*
+ * Begin direct walk (a.k.a pre-order traversal).
+ *
+ * Pre-order traversal while duplicating nodes and edges can make a complete
+ * duplicate of a binary tree. It can also be used to make a prefix expression
+ * (Polish notation) from expression trees: traverse the expression tree
+ * pre-orderly.
+ *
+ * https://en.wikipedia.org/wiki/Tree_traversal#Applications
+ */
+void
+rb_begin_direct_walk(RBTree *rb, RBTreeDirectWalk * dw)
+{
+	dw->rb = rb;
+	dw->last_visited = NULL;
+	dw->is_over = (dw->rb->root == RBNIL);
+}
+
+/*

Re: [HACKERS] copyParamList

2016-07-27 Thread Robert Haas

On Wed, Jul 27, 2016 at 10:09 AM, Amit Kapila  wrote:
> On Wed, Jul 27, 2016 at 6:46 PM, Robert Haas  wrote:
>> On Wed, Jul 27, 2016 at 2:20 AM, Amit Kapila  wrote:
>>> Okay, that makes sense to me apart from minor issue reported by
>>> Andrew.  I think we might want to slightly rephrase the comments on
>>> top of copyParamList() which indicates only about dynamic parameter
>>> hooks.
>>
>> I don't see why it needs to be changed - can you explain in more
>> detail what you have in mind?
>>
>
> Basically after this function, usage of ParamListInfo doesn't need to
> care for value of paramMask as you have already ignored the required
> params.  I think it is quite apparent from the code, but as the
> similar information is mention for dynamic parameter hooks, I thought
> we might want to update it.  If you don't feel the need of same, then
> leave it as it is.

Yeah, I don't see a need to mention that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2016-07-27 Thread Robert Haas

On Fri, Jul 22, 2016 at 7:15 AM, Anton Dignös  wrote:
> We would like to contribute to PostgreSQL a solution that supports the query
> processing of "at each time point". The basic idea is to offer two new
> operators, NORMALIZE and ALIGN, whose purpose is to adjust (or split) the
> ranges of tuples so that subsequent queries can use the usual grouping and
> equality conditions to get the intended results.

I think that it is great that you want to contribute your work to
PostgreSQL.  I don't know whether there will be a consensus that this
is generally useful functionality that we should accept, but I commend
the effort anyhow.  Assuming there is, getting this into a state that
we consider committable will probably take quite a bit of additional
work on your part; no one will do it for you.  If you're still
interested in proceeding given those caveats, please add your patch
here so that it gets reviewed:

https://commitfest.postgresql.org/action/commitfest_view/open

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] copyParamList

2016-07-27 Thread Amit Kapila

On Wed, Jul 27, 2016 at 6:46 PM, Robert Haas  wrote:
> On Wed, Jul 27, 2016 at 2:20 AM, Amit Kapila  wrote:
>> Okay, that makes sense to me apart from minor issue reported by
>> Andrew.  I think we might want to slightly rephrase the comments on
>> top of copyParamList() which indicates only about dynamic parameter
>> hooks.
>
> I don't see why it needs to be changed - can you explain in more
> detail what you have in mind?
>

Basically after this function, usage of ParamListInfo doesn't need to
care for value of paramMask as you have already ignored the required
params.  I think it is quite apparent from the code, but as the
similar information is mention for dynamic parameter hooks, I thought
we might want to update it.  If you don't feel the need of same, then
leave it as it is.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Macro customizable hashtable / bitmapscan & aggregation perf

2016-07-27 Thread Robert Haas

On Tue, Jul 26, 2016 at 8:43 PM, Andres Freund  wrote:
> As previously mentioned, dynahash isn't particularly fast. In many cases
> that doesn't particularly matter. But e.g. hash aggregation and bitmap
> index scans are quite sensitive to hash performance.
>
> The biggest problems with dynahash (also discussed at [1]) are
>
> a) two level structure doubling the amount of indirect lookups
> b) indirect function calls
> c) using separate chaining based conflict resolution
> d) being too general.

I am ... skeptical about open addressing.  It's less unappealing for
this application than in general because we don't actually need to
delete anything, but that wouldn't be true for the catcache.  All the
same, I feel that we need to assess the risk that we're improving
average-case performance while creating large regressions in other
cases (i.e. where there is clustering).

Do likely() and unlikely() actually buy us anything here?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No longer possible to query catalogs for index capabilities?

2016-07-27 Thread Tom Lane

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Yeah.  I'm not very impressed by the underlying assumption that it's
>> okay for client-side code to hard-wire knowledge about what indoption
>> bits mean, but not okay for it to hard-wire knowledge about which index
>> AMs use which indoption bits.  There's something fundamentally wrong
>> in that.  We don't let psql or pg_dump look directly at indoption, so
>> why would we think that third-party client-side code should do so?

> For my 2c, I'd like to see pg_dump able to use the catalog tables to
> derive the index definition, just as they manage to figure out table
> definitions without (for the most part) using functions.  More
> generally, I believe we should be working to reach a point where we can
> reconstruct all objects in the database using just the catalog, without
> any SQL bits being provided from special functions which access
> information that isn't available at the SQL level.

No, I reject that entirely.  It would be insane for example to expect that
random client-side code should be able to interpret the node trees stored
in places like pg_index.indexprs.  It's barely possible that we could
maintain such logic in pg_dump, though having to maintain a different
version for each supported server branch would be a giant PITA.  But do
you also want to maintain translated-into-Java copies of each of those
libraries for the benefit of JDBC?  Or any other language that client
code might be written in?

Now, obviously knowing which bit in pg_index.indoption does what would be
a few orders of magnitude less of a maintenance hazard than knowing what
expression node trees contain.  But that doesn't make it a good
future-proof thing for clients to be doing.  If the answer to the question
"why do you need access to pg_am.amcanorder?" is "so I can interpret the
bits in pg_index.indoption", I think it's clear that we've got an
abstraction failure that is not going to be fixed by just exposing
something equivalent to the old pg_am definition.

Building on the has-property approach Andrew suggested, I wonder if
we need something like pg_index_column_has_property(indexoid, colno,
propertyname) with properties like "sortable", "desc", "nulls first".

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we lost Uber as a user

2016-07-27 Thread Merlin Moncure

On Tue, Jul 26, 2016 at 5:07 PM, Tom Lane  wrote:
> Josh Berkus  writes:
>> To explain this in concrete terms, which the blog post does not:
>
>> 1. Create a small table, but one with enough rows that indexes make
>> sense (say 50,000 rows).
>
>> 2. Make this table used in JOINs all over your database.
>
>> 3. To support these JOINs, index most of the columns in the small table.
>
>> 4. Now, update that small table 500 times per second.
>
>> That's a recipe for runaway table bloat; VACUUM can't do much because
>> there's always some minutes-old transaction hanging around (and SNAPSHOT
>> TOO OLD doesn't really help, we're talking about minutes here), and
>> because of all of the indexes HOT isn't effective.
>
> Hm, I'm not following why this is a disaster.  OK, you have circa 100%
> turnover of the table in the lifespan of the slower transactions, but I'd
> still expect vacuuming to be able to hold the bloat to some small integer
> multiple of the minimum possible table size.  (And if the table is small,
> that's still small.)  I suppose really long transactions (pg_dump?) could
> be pretty disastrous, but there are ways around that, like doing pg_dump
> on a slave.
>
> Or in short, this seems like an annoyance, not a time-for-a-new-database
> kind of problem.

Well, the real annoyance as I understand it is the raw volume of bytes
of WAL traffic a single update of a field can cause.  They switched to
statement level replication(!).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] copyParamList

2016-07-27 Thread Robert Haas

On Wed, Jul 27, 2016 at 2:20 AM, Amit Kapila  wrote:
> Okay, that makes sense to me apart from minor issue reported by
> Andrew.  I think we might want to slightly rephrase the comments on
> top of copyParamList() which indicates only about dynamic parameter
> hooks.

I don't see why it needs to be changed - can you explain in more
detail what you have in mind?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] copyParamList

2016-07-27 Thread Robert Haas

On Wed, Jul 27, 2016 at 12:25 AM, Andrew Gierth
 wrote:
>> "Robert" == Robert Haas  writes:
>  Robert> So I think we instead ought to fix it as in the attached.
>
>  Robert>if (retval->paramMask != NULL &&
>  Robert> -  !bms_is_member(i, retval->paramMask))
>  Robert> +  !bms_is_member(i, from->paramMask))
>
> Need to change both references to retval->paramMask, not just one.

You are, of course, entirely correct.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE

2016-07-27 Thread Vik Fearing

On 27/07/16 06:11, David Fetter wrote:
> On Wed, Jul 27, 2016 at 03:24:28AM +0200, Vik Fearing wrote:
>> On 27/07/16 03:15, Peter Eisentraut wrote:
>>> On 7/26/16 6:14 PM, Vik Fearing wrote:
 As mentioned elsewhere in the thread, you can just do WHERE true
 to get around it, so why on Earth have it PGC_SUSET?
>>>
>>> I'm not sure whether it's supposed to guard against typos and
>>> possibly buggy SQL string concatenation in application code.  So
>>> it would help against accidental mistakes, whereas putting a WHERE
>>> TRUE in there would be an intentional override.
>>
>> If buggy SQL string concatenation in application code is your
>> argument, quite a lot of them add "WHERE true" so that they can just
>> append a bunch of "AND ..." clauses without worrying if it's the
>> first (or last, whatever), so I'm not sure this is protecting
>> anything.
> 
> I am sure that I'm not the only one who's been asked for this feature
> because people other than me have piped up on this thread to that very
> effect.

Sure.  I'm just saying that I think it is poorly designed.  I think it
would be far better to error out if the command affects x rows, or an
estimated y% of the table.

Doing that, and also allowing the user to turn it off, would solve the
problem as I understand your presentation of it.

> I understand that there may well be lots of really meticulous people
> on this list, people who would never accidentally do an unqualified
> DELETE on a table in production, but I can't claim to be one of them
> because I have, and not just once.  It's under once a decade, but even
> that's too many.

That doesn't mean that requiring a WHERE clause -- without even looking
at what's in it -- is a good idea.

Why not start by turning off autocommit, for example?

> I'm not proposing to make this feature default, or even available by
> default, but I am totally certain that this is the kind of feature
> people would really appreciate, even if it doesn't prevent every
> catastrophe.

This kind of feature, why not.  This feature, no.
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MSVC pl-perl error message is not verbose enough

2016-07-27 Thread John Harvey

On Tue, Jul 26, 2016 at 9:44 PM, Michael Paquier 
wrote:

> On Wed, Jul 27, 2016 at 12:41 AM, John Harvey
>  wrote:
> > Because of this, I've submitted a small patch which fixes the verbosity
> of
> > the error message to actually explain what's missing.  I hope that this
> > patch will be considered for the community, and it would be nice if it
> was
> > back-patched.
>
> Improving this error message a bit looks like a good idea to me.
> Looking at the code to figure out what build.pl is looking for is a
> bit a pain for users just willing to compile the code, so if we can
> avoid such lookups with a cheap way, let's do it.
>
> Instead of your patch, I'd suggest saving $solution->{options}->{perl}
> . '\lib\CORE\perl*.lib' in a variable and then raise an error based on
> that. See attached.


Looks good to me.  I think this is a cleaner approach.
I've just verified that it works correctly, so I'm happy with it.

-John

Re: [HACKERS] Why we lost Uber as a user

2016-07-27 Thread Vik Fearing

On 27/07/16 05:45, Robert Haas wrote:
> On Tue, Jul 26, 2016 at 8:27 PM, Stephen Frost  wrote:
>> * Joshua D. Drake (j...@commandprompt.com) wrote:
>>> Hello,
>>>
>>> The following article is a very good look at some of our limitations
>>> and highlights some of the pains many of us have been working
>>> "around" since we started using the software.
>>>
>>> https://eng.uber.com/mysql-migration/
>>>
>>> Specifically:
>>>
>>> * Inefficient architecture for writes
>>> * Inefficient data replication
>>
>> The above are related and there are serious downsides to having an extra
>> mapping in the middle between the indexes and the heap.
>>
>> What makes me doubt just how well they understood the issues or what is
>> happening is the lack of any mention of hint bits of tuple freezing
>> (requiring additional writes).
> 
> Yeah.  A surprising amount of that post seemed to be devoted to
> describing how our MVCC architecture works rather than what problem
> they had with it.  I'm not saying we shouldn't take their bad
> experience seriously - we clearly should - but I don't feel like it's
> as clear as it could be about exactly where the breakdowns happened.

There is some more detailed information in this 30-minute talk:
https://vimeo.com/145842299
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimizing numeric SUM() aggregate

2016-07-27 Thread Tom Lane

Andrew Borodin  writes:
>> I don't think there is any reason for this new code to assume NBASE=1

> There is a comment on line 64 stating that value 1 is hardcoded
> somewhere else, any other value is not recommended and a bunch of code
> is left for historical reasons.

Doesn't matter: please use NBASE when you mean NBASE.  1 is just a
magic number, and therefore bad coding style.  For that matter, spelling
INT_MAX as 0x7FF is also not per project style.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] handling unconvertible error messages

2016-07-27 Thread Craig Ringer

On 25 July 2016 at 22:43, Peter Eisentraut  wrote:

> Example: I have a database cluster initialized with --locale=ru_RU.UTF-8
> (built with NLS).  Let's say for some reason, I have client encoding set
> to LATIN1.  All error messages come back like this:
>
> test=> select * from notthere;
> ERROR:  character with byte sequence 0xd0 0x9e in encoding "UTF8" has no
> equivalent in encoding "LATIN1"
>
> There is no straightforward way for the client to learn that there is a
> real error message, but it could not be converted.
>
> I think ideally we could make this better in two ways:
>
> 1) Send the original error message untranslated.  That would require
> saving the original error message in errmsg(), errdetail(), etc.  That
> would be a lot of work for only the occasional use.  But it would also
> facilitate an occasionally-requested feature of writing untranslated
> error messages into the server log or the csv log, while sending
> translated messages to the client (or some variant thereof).
>
> 2) Send an indication that there was an encoding problem.  Maybe a
> NOTICE, or an error context?  Wiring all this into elog.c looks a bit
> tricky, however.
>
>
We have a similar problem with the server logs. But there there's also an
additional problem: if there isn't any character mapping issue we just
totally ignore text encoding concerns and log in whatever encoding the
client asked the backend to use into the log files. So log files can be a
line-by-line mix of UTF-8, ISO-8859-1, and whatever other fun encodings
someone asks for. There is *no* way to correctly read such a file since
lines don't have any marking as to their encoding and no tools out there
support line-by-line differently encoded text files anyway.

I'm not sure how closely it ties in to the issue you mention, but I think
it's at least related enough to keep in mind while considering the
client_encoding issue.

I suggest (3) "log the message with unmappable characters masked". Though I
would definitely like to be able to also send the raw original, along with
a field indicating the encoding of the original since it won't be the
client_encoding, since we need some way to get to the info.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Optimizing numeric SUM() aggregate

2016-07-27 Thread Dean Rasheed

On 27 July 2016 at 10:17, Andrew Borodin  wrote:
>>   if (accum->maxdig > (INT_MAX - INT_MAX / NBASE) / (NBASE - 1))
> Woth noting that (INT_MAX - INT_MAX / NBASE) / (NBASE - 1) == INT_MAX
> / NBASE for any NBASE > 1

Interesting. I think it's clearer the way it is in mul_var() though,
because the intention is to avoid letting digits exceed INT_MAX -
INT_MAX/NBASE, so that there is no danger of overflow in the carry
propagation step. The long form makes that clearer (and is just as
efficient, since these expressions will be evaluated by the
preprocessor).

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimizing numeric SUM() aggregate

2016-07-27 Thread Dean Rasheed

On 27 July 2016 at 09:57, Dean Rasheed  wrote:
> it could be
> coded using something like
>
> accum->maxdig += NBASE - 1;
> if (accum->maxdig > (INT_MAX - INT_MAX / NBASE) / (NBASE - 1))
> Propagate carries...
>

Oops, that's not quite right because maxdig is actually epresents the
max possible value divided by NBASE-1 in mul_var(), so actually it
ought to have been accum->maxdig++ in the first line.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimizing numeric SUM() aggregate

2016-07-27 Thread Andrew Borodin

>   if (accum->maxdig > (INT_MAX - INT_MAX / NBASE) / (NBASE - 1))
Woth noting that (INT_MAX - INT_MAX / NBASE) / (NBASE - 1) == INT_MAX
/ NBASE for any NBASE > 1
>I don't think there is any reason for this new code to assume NBASE=1
There is a comment on line 64 stating that value 1 is hardcoded
somewhere else, any other value is not recommended and a bunch of code
is left for historical reasons.

Best regards, Andrey Borodin, Octonica & Ural Federal University.

2016-07-27 13:57 GMT+05:00 Dean Rasheed :
> On 27 July 2016 at 07:33, Andrew Borodin  wrote:
>>>I think we could do carry every 0x7FF / 1 accumulation, couldn't we?
>>
>> I feel that I have to elaborate a bit. Probably my calculations are wrong.
>>
>> Lets assume we already have accumulated INT_MAX of  digits in
>> previous-place accumulator. That's almost overflow, but that's not
>> overflow. Carring that accumulator to currents gives us INT_MAX/1
>> carried sum.
>> So in current-place accumulator we can accumulate: ( INT_MAX - INT_MAX
>> / 1 ) / , where  is max value dropped in current-place
>> accumulator on each addition.
>> That is INT_MAX *  /  or simply INT_MAX / 1.
>>
>> If we use unsigned 32-bit integer that is 429496. Which is 43 times
>> less frequent carring.
>>
>> As a bonus, we get rid of  const in the code (:
>>
>> Please correct me if I'm wrong.
>>
>
> This is basically the same problem that has already been solved in
> mul_var() and other places in numeric.c, so in this case it could be
> coded using something like
>
> accum->maxdig += NBASE - 1;
> if (accum->maxdig > (INT_MAX - INT_MAX / NBASE) / (NBASE - 1))
> Propagate carries...
>
> I agree that the new code should avoid explicitly referring to
> constants like , and I don't think there is any reason for this
> new code to assume NBASE=1.
>
> Regards,
> Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans

2016-07-27 Thread Etsuro Fujita


On 2016/07/27 13:51, Kouhei Kaigai wrote:

This output is, at least, not incorrect.
This ForeignScan actually scan a relation that consists of two joined
tables on the remote side. So, not incorrect, but may not convenient for
better understanding by users who don't have deep internal knowledge.


Hmm.


On the other hands, I cannot be happy with your patch because it concludes
the role of ForeignScan/CustomScan with scanrelid==0 is always join.
However, it does not cover all the case.

When postgres_fdw gets support of remote partial aggregate, we can implement
the feature using ForeignScan with scanrelid==0. Is it a join? No.


Yeah, I think that that could be implemented in both cases (scanrelid>0  
and scanrelid=0).



Probably, the core cannot know exact purpose of ForeignScan/CustomScan with
scanrelid==0. It can be a remote partial aggregation. It can be an alternative
sort logic by GPU. It can be something others.
So, I think extension needs to inform the core more proper label to show;
including a flag to control whether the relation(s) name shall be attached
next to the ForeignScan/CustomScan line.

I'd like you to suggest to add two fields below:
 - An alternative label extension wants to show instead of the default one


How about adding something like the "Additional Operations" line as  
proposed in a previous email, instead?  As you know, FDWs/Extensions  
could add such information through ExplainForeignScan/ExplainCustomScan.



 - A flag to turn on/off printing relation(s) name


ISTM that the relation information should be always printed even in the  
case of scanrelid=0 by the core, because that would be essential for  
analyzing EXPLAIN results.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimizing numeric SUM() aggregate

2016-07-27 Thread Dean Rasheed

On 27 July 2016 at 07:33, Andrew Borodin  wrote:
>>I think we could do carry every 0x7FF / 1 accumulation, couldn't we?
>
> I feel that I have to elaborate a bit. Probably my calculations are wrong.
>
> Lets assume we already have accumulated INT_MAX of  digits in
> previous-place accumulator. That's almost overflow, but that's not
> overflow. Carring that accumulator to currents gives us INT_MAX/1
> carried sum.
> So in current-place accumulator we can accumulate: ( INT_MAX - INT_MAX
> / 1 ) / , where  is max value dropped in current-place
> accumulator on each addition.
> That is INT_MAX *  /  or simply INT_MAX / 1.
>
> If we use unsigned 32-bit integer that is 429496. Which is 43 times
> less frequent carring.
>
> As a bonus, we get rid of  const in the code (:
>
> Please correct me if I'm wrong.
>

This is basically the same problem that has already been solved in
mul_var() and other places in numeric.c, so in this case it could be
coded using something like

accum->maxdig += NBASE - 1;
if (accum->maxdig > (INT_MAX - INT_MAX / NBASE) / (NBASE - 1))
Propagate carries...

I agree that the new code should avoid explicitly referring to
constants like , and I don't think there is any reason for this
new code to assume NBASE=1.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Oddity in EXPLAIN for foreign/custom join pushdown plans

2016-07-27 Thread Etsuro Fujita


On 2016/07/27 13:09, Ashutosh Bapat wrote:

The patch always prints ForeignJoin when scanrelid <= 0, which would be
odd considering that FDWs can now push down post-join operations. We
need to device a better way to convey post-join operations. May be
something like
Foreign Grouping, aggregation on ...
Foreign Join on ...


Good point!

The point of the proposed patch is to print "Foreign Join"/"Custom 
Join", to show, in every text/xml/json/yaml format, that there are 
multiple target relations involved in that join and to print those 
relations, following the "Foreign Join"/"Custom Join" words, which I 
think would be more machine-readable.



But then the question is a foreign scan node can be pushing down many
operations together e.g. join, aggregation, sort OR join aggregation and
windowing OR join and insert. How would we clearly convey this? May be
we say
Foreign Scan
operations: join on ..., aggregation, ...

That wouldn't be so great and might be clumsy for many operations. Any
better idea?


How about: when doing post scan/join operations remotely, print such 
additional operations in EXPLAIN the same way as in the "Relations" or 
"Remote SQL" info printed by postgres_fdw.  Maybe something like this:


  Foreign Scan/Join on target relation(s)
Output: ...
Relations: ...
Additional Operations: grouping, aggregate, ...
Remote SQL: ...

That seems not that clumsy to me.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimizing numeric SUM() aggregate

2016-07-27 Thread Andrew Borodin

>I think we could do carry every 0x7FF / 1 accumulation, couldn't we?

I feel that I have to elaborate a bit. Probably my calculations are wrong.

Lets assume we already have accumulated INT_MAX of  digits in
previous-place accumulator. That's almost overflow, but that's not
overflow. Carring that accumulator to currents gives us INT_MAX/1
carried sum.
So in current-place accumulator we can accumulate: ( INT_MAX - INT_MAX
/ 1 ) / , where  is max value dropped in current-place
accumulator on each addition.
That is INT_MAX *  /  or simply INT_MAX / 1.

If we use unsigned 32-bit integer that is 429496. Which is 43 times
less frequent carring.

As a bonus, we get rid of  const in the code (:

Please correct me if I'm wrong.


Best regards, Andrey Borodin, Octonica & Ural Federal University.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dumping extensions having sequences with 9.6beta3

2016-07-27 Thread Michael Paquier

On Wed, Jul 27, 2016 at 8:07 AM, Stephen Frost  wrote:
> That'd be great.  It's definitely on my list of things to look into, but
> I'm extremely busy this week.  I hope to look into it on Friday, would
> be great to see what you find.

Sequences that are directly defined in extensions do not get dumped,
and sequences that are part of a serial column in an extension are
getting dumped. Looking into this problem, getOwnedSeqs() is visibly
doing an incorrect assumption: sequences owned by table columns are
dumped unconditionally, but this is not true for sequences that are
part of extensions. More precisely, dobj->dump is being enforced to
DUMP_COMPONENT_ALL, which makes the sequence definition to be dumped.
Oops.

The patch attached fixes the problem for me. I have added as well
tests in test_pg_dump in the shape of sequences defined in an
extension, and sequences that are part of a serial column. This patch
is also able to work in the case where a sequence is created as part
of a serial column, and gets removed after, say with ALTER EXTENSION
DROP SEQUENCE. The behavior for sequences and serial columns that are
not part of extensions is unchanged.

Stephen, it would be good if you could check the correctness of this
patch as you did all this refactoring of pg_dump to support catalog
ACLs. I am sure by the way that checking for (owning_tab->dobj.dump &&
DUMP_COMPONENT_DEFINITION) != 0 is not good because of for example the
case of a serial column created in an extension where the sequence is
dropped from the extension afterwards.
-- 
Michael
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 08c2b0c..0278995 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -6037,6 +6037,8 @@ getOwnedSeqs(Archive *fout, TableInfo tblinfo[], int numTables)
 			continue;			/* not an owned sequence */
 		if (seqinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
 			continue;			/* no need to search */
+		if (seqinfo->dobj.ext_member)
+			continue;			/* member of an extension */
 		owning_tab = findTableByOid(seqinfo->owning_tab);
 		if (owning_tab && owning_tab->dobj.dump)
 		{
diff --git a/src/test/modules/test_pg_dump/t/001_base.pl b/src/test/modules/test_pg_dump/t/001_base.pl
index fd9c37f..9caee93 100644
--- a/src/test/modules/test_pg_dump/t/001_base.pl
+++ b/src/test/modules/test_pg_dump/t/001_base.pl
@@ -226,7 +226,7 @@ my %tests = (
 	'CREATE TABLE regress_pg_dump_table' => {
 		regexp => qr/^
 			\QCREATE TABLE regress_pg_dump_table (\E
-			\n\s+\Qcol1 integer,\E
+			\n\s+\Qcol1 integer NOT NULL,\E
 			\n\s+\Qcol2 integer\E
 			\n\);$/xm,
 		like   => { binary_upgrade => 1, },
@@ -241,6 +241,48 @@ my %tests = (
 			schema_only=> 1,
 			section_pre_data   => 1,
 			section_post_data  => 1, }, },
+	'CREATE SEQUENCE regress_pg_dump_table_col1_seq' => {
+		regexp => qr/^
+			\QCREATE SEQUENCE regress_pg_dump_table_col1_seq\E
+			\n\s+\QSTART WITH 1\E
+			\n\s+\QINCREMENT BY 1\E
+			\n\s+\QNO MINVALUE\E
+			\n\s+\QNO MAXVALUE\E
+			\n\s+\QCACHE 1;\E
+			$/xm,
+		like   => { binary_upgrade => 1, },
+		unlike => {
+			clean  => 1,
+			clean_if_exists=> 1,
+			createdb   => 1,
+			defaults   => 1,
+			no_privs   => 1,
+			no_owner   => 1,
+			pg_dumpall_globals => 1,
+			schema_only=> 1,
+			section_pre_data   => 1,
+			section_post_data  => 1, }, },
+	'CREATE SEQUENCE regress_pg_dump_seq' => {
+		regexp => qr/^
+			\QCREATE SEQUENCE regress_pg_dump_seq\E
+			\n\s+\QSTART WITH 1\E
+			\n\s+\QINCREMENT BY 1\E
+			\n\s+\QNO MINVALUE\E
+			\n\s+\QNO MAXVALUE\E
+			\n\s+\QCACHE 1;\E
+			$/xm,
+		like   => { binary_upgrade => 1, },
+		unlike => {
+			clean  => 1,
+			clean_if_exists=> 1,
+			createdb   => 1,
+			defaults   => 1,
+			no_privs   => 1,
+			no_owner   => 1,
+			pg_dumpall_globals => 1,
+			schema_only=> 1,
+			section_pre_data   => 1,
+			section_post_data  => 1, }, },
 	'CREATE ACCESS METHOD regress_test_am' => {
 		regexp => qr/^
 			\QCREATE ACCESS METHOD regress_test_am TYPE INDEX HANDLER bthandler;\E
diff --git a/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql b/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql
index 5fe6063..93de2c5 100644
--- a/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql
+++ b/src/test/modules/test_pg_dump/test_pg_dump--1.0.sql
@@ -4,10 +4,12 @@
 \echo Use "CREATE EXTENSION test_pg_dump" to load this file. \quit
 
 CREATE TABLE regress_pg_dump_table (
-	col1 int,
+	col1 serial,
 	col2 int
 );
 
+CREATE SEQUENCE regress_pg_dump_seq;
+
 GRANT SELECT ON regress_pg_dump_table TO regress_dump_test_role;
 GRANT SELECT(col1) ON regress_pg_dump_table TO public;
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] copyParamList

2016-07-27 Thread Amit Kapila

On Wed, Jul 27, 2016 at 2:03 AM, Robert Haas  wrote:
> On Fri, Jul 8, 2016 at 2:28 AM, Amit Kapila  wrote:
>> On Tue, May 31, 2016 at 10:10 PM, Robert Haas  wrote:
>>> On Fri, May 27, 2016 at 6:07 PM, Andrew Gierth
>>>  wrote:
 copyParamList does not respect from->paramMask, in what looks to me like
 an obvious oversight:

 retval->paramMask = NULL;
 [...]
 /* Ignore parameters we don't need, to save cycles and space. */
 if (retval->paramMask != NULL &&
 !bms_is_member(i, retval->paramMask))

 retval->paramMask is never set to anything not NULL in this function,
 so surely that should either be initializing it to from->paramMask, or
 checking from->paramMask in the conditional?
>>>
>>> Oh, dear.  I think you are right.  I'm kind of surprised this didn't
>>> provoke a test failure somewhere.
>>>
>>
>> The reason why it didn't provoked any test failure is that it doesn't
>> seem to be possible that from->paramMask in copyParamList can ever be
>> non-NULL. Params formed will always have paramMask set as NULL in the
>> code paths where copyParamList is used.  I think we can just assign
>> from->paramMask to retval->paramMask to make sure that even if it gets
>> used in future, the code works as expected.  Alternatively, one might
>> think of adding an Assert there, but that doesn't seem to be
>> future-proof.
>
> Hmm, I don't think this is the right fix.  The point of the
> bms_is_member test is that we don't want to copy any parameters that
> aren't needed; but if we avoid copying parameters that aren't needed,
> then it's fine for the new ParamListInfo to have a paramMask of NULL.
> So I think it's actually correct that we set it that way, and I think
> it's better than saving a pointer to the the original paramMask,
> because (1) it's probably slightly faster to have it as NULL and (2)
> the old paramMask might not be allocated in the same memory context as
> the new ParamListInfo.  So I think we instead ought to fix it as in
> the attached.
>

Okay, that makes sense to me apart from minor issue reported by
Andrew.  I think we might want to slightly rephrase the comments on
top of copyParamList() which indicates only about dynamic parameter
hooks.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

41 matches

Mail list logo