Re: [HACKERS] Add pg_settings.pending_restart column

2015-02-15 Thread David G Johnston
Peter Eisentraut-2 wrote
 So here is a patch for that.  It adds a column pending_restart to
 pg_settings that is true when the configuration file contains a changed
 setting that requires a restart.  We already had the logic to detect
 such changes, for producing the log entry.  I have also set it up so
 that if you change your mind and undo the setting and reload the server,
 the pending_restart flag is reset to false.

Doc typo: s/of/if/

Otherwise it seems fine but I cannot help but feel that false positives are
possible; though nothing that doesn't already exist.  Mainly, is the change
going to end up only affect the reset or default value but not the currently
active value?

Instead of a boolean I'd rather have a string/enum that can capture the fact
that a reboot is required and the expected outcome.  The same field can then
communicate non-reboot stuff too - like SIGHUP required or session
scope.

A simple reboot required boolean api could just as easily be done via a
function; and why limit to just reboots and not reconnection or SIGHUP
required?

Scope creeping but the reboot case doesn't seem that special overall; other
than the effort needed to realize the updated value.

David J.






--
View this message in context: 
http://postgresql.nabble.com/Add-pg-settings-pending-restart-column-tp5838009p5838020.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] KNN-GiST with recheck

2015-02-15 Thread Alexander Korotkov
On Thu, Jan 8, 2015 at 1:12 AM, Alexander Korotkov aekorot...@gmail.com
wrote:

 On Tue, Dec 16, 2014 at 4:37 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 Patch attached. It should be applied on top of my pairing heap patch at
 http://www.postgresql.org/message-id/548ffa2c.7060...@vmware.com. Some
 caveats:

 * The signature of the distance function is unchanged, it doesn't get a
 recheck argument. It is just assumed that if the consistent function sets
 the recheck flag, then the distance needs to be rechecked as well. We might
 want to add the recheck argument, like you Alexander did in your patch, but
 it's not important right now.


 I didn't get how that expected to work if we have only order by qual
 without filter qual. In this case consistent function just isn't called at
 all.

 * I used the distance term in the executor, although the ORDER BY expr
 machinery is more general than that. The value returned by the ORDER BY
 expression doesn't have to be a distance, although that's the only thing
 supported by GiST and the built-in opclasses.

 * I short-circuited the planner to assume that the ORDER BY expression
 always returns a float. That's true today for knn-GiST, but is obviously a
 bogus assumption in general.

 This needs some work to get into a committable state, but from a
 modularity point of view, this is much better than having the indexam to
 peek into the heap.


 Nice idea to put reordering into index scan node. Doesn't look like much
 of overengineering. I'm going to bring it to more commitable state.


Following changes has been made in attached patch:

 * Get sort operators from pathkeys.
 * Recheck argument of distance function has been reverted.

--
With best regards,
Alexander Korotkov.


knn-gist-recheck-5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] restrict global access to be readonly

2015-02-15 Thread David G Johnston
happy times wrote
 Sure, we can utilize the   runtime parameter
 default_transaction_read_only, however, it does not restrict user from
 changing transaction attribute to non-readonly mode, so is not safe. 

ISTM that implementing a means to make this setting only super-user
changeable would be a quick(er) solution to the problem - the decision as to
what to disallow is already decided.

It seems like it would have to be a separate GUC but it would require any
new SQL but would simply leverage the existing settings system to setup
database-user values that only a super-user can change.

David J.




--
View this message in context: 
http://postgresql.nabble.com/restrict-global-access-to-be-readonly-tp5837818p5838021.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] forward vs backward slashes in msvc build code

2015-02-15 Thread Michael Paquier
On Sun, Feb 15, 2015 at 12:51 PM, Peter Eisentraut pete...@gmx.net wrote:
 I understand that on Windows, you can use forward slashes in path names
 interchangeably with backward slashes (except possibly in the shell).  I
 have developed the attached patch to change the msvc build code to use
 forward slashes consistently.  Together with the other attached patch,
 which is an unpolished hack, this allows me to run the build.pl script
 on not-Windows.  It won't actually build, but it will create all the
 project files.  That way, I can check whether some makefile hackery
 would break the Windows build.

This sounds like a good idea to improve the checks of the scripts of
Windows, like the presence of win32rc files, etc.

 Could someone verify this please?

I tested quickly the second patch with MS 2010 and I am getting a
build failure: chkpass cannot complete because of crypt missing. On
master build passes. More details here:
C:\Users\mpaquier\git\postgres\pgsql.sln (default target) (1) -
C:\Users\mpaquier\git\postgres\chkpass.vcxproj (default target) (36) -
(Link target) -
  chkpass.obj : error LNK2019: unresolved external symbol crypt
referenced in function chkpass_in
[C:\Users\ioltas\git\postgres\chkpass.vcxproj]
  .\Release\chkpass\chkpass.dll : fatal error LNK1120: 1 unresolved
externals [C:\Users\mpaquier\git\postgres\chkpass.vcxproj]

0 Warning(s)
2 Error(s)

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication Helpers WIP for discussion

2015-02-15 Thread Petr Jelinek

On 13/02/15 14:04, Petr Jelinek wrote:

On 13/02/15 08:48, Michael Paquier wrote:


Looking at this patch, I don't see what we actually gain much here
except a decoder plugin that speaks a special protocol for a special
background worker that has not been presented yet. What actually is the
value of that defined as a contrib/ module in-core. Note that we have
already test_decoding to basically test the logical decoding facility,
used at least at the SQL level to get logical changes decoded.

Based on those reasons I am planning to mark this as rejected (it has no
documentation as well). So please speak up if you think the contrary,
but it seems to me that this could live happily out of core.


I think you are missing point of this, it's not meant to be committed in
this form at all and even less as contrib module. It was meant as basis
for in-core logical replication discussion, but sadly I didn't really
have time to pursue it in this CF in the end.



That being said and looking at the size of February CF, I think I am 
fine with dropping this in 9.5 cycle, it does not seem likely that there 
will be anything useful done with this fast enough to get to 9.5 so 
there is no point in spending committer resources on it in final CF.


I will pick it up again after the CF is done.

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Really bad blowups with hash outer join and nulls

2015-02-15 Thread Tom Lane
Andrew Gierth and...@tao11.riddles.org.uk writes:
 A quick test suggests that initializing the hash value to ~0 rather than
 0 has a curious effect: the number of batches still explodes, but the
 performance does not suffer the same way. (I think because almost all
 the batches end up empty.) I think this is worth doing even in the
 absence of a more general solution; nulls are common enough and
 important enough that they shouldn't be the worst-case value if it can
 be avoided.

I think that's unlikely to be a path to a good solution.

At least part of the problem here is that estimate_hash_bucketsize()
supposes that nulls can be ignored --- which is normally true, and
invalidates your claim that they're common.  But in a RIGHT JOIN
situation, they need to be considered as if they were regular keys.
That would probably be sufficient to dissuade the planner from choosing
a hash join in this example.

There may also be something we can do in the executor, but it would
take closer analysis to figure out what's going wrong.  I don't think
kluging the behavior for NULL in particular is the answer.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-02-15 Thread Kouhei Kaigai
The attached patch is a rebased version of join replacement with
foreign-/custom-scan. Here is no feature updates at this moment
but SGML documentation is added (according to Michael's comment).

This infrastructure allows foreign-data-wrapper and custom-scan-
provider to add alternative scan paths towards relations join.
From viewpoint of the executor, it looks like a scan on a pseudo-
relation that is materialized from multiple relations, even though
FDW/CSP internally processes relations join with their own logic.

Its basic idea is, (1) scanrelid==0 indicates this foreign/custom
scan node runs on a pseudo relation and (2) fdw_ps_tlist and
custom_ps_tlist introduce the definition of the pseudo relation,
because it is not associated with a tangible relation unlike
simple scan case, thus planner cannot know the expected record
type to be returned without these additional information.
These two enhancement enables extensions to process relations
join internally, and to perform as like existing scan node from
viewpoint of the core backend.

Also, as an aside. I had a discussion with Hanada-san about this
interface off-list. He had an idea to keep create_plan_recurse()
static, using a special list field in CustomPath structure to
chain underlying Path node. If core backend translate the Path
node to Plan node if valid list given, extension does not need to
call create_plan_recurse() by itself.
I have no preference about this. Does anybody have opinion?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai
 Sent: Thursday, January 15, 2015 8:03 AM
 To: Robert Haas
 Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
 Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan
 API)
 
  On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai kai...@ak.jp.nec.com
 wrote:
   When custom-scan node replaced a join-plan, it shall have at least
   two child plan-nodes. The callback handler of PlanCustomPath needs
   to be able to call create_plan_recurse() to transform the underlying
   path-nodes to plan-nodes, because this custom-scan node may take
   other built-in scan or sub-join nodes as its inner/outer input.
   In case of FDW, it shall kick any underlying scan relations to
   remote side, thus we may not expect ForeignScan has underlying plans...
 
  Do you have an example of this?
 
 Yes, even though full code set is too large for patch submission...
 
 https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880
 
 This create_gpuhashjoin_plan() is PlanCustomPath callback of GpuHashJoin.
 It takes GpuHashJoinPath inherited from CustomPath that has multiple
 underlying scan/join paths.
 Once it is called back from the backend, it also calls create_plan_recurse()
 to make inner/outer plan nodes according to the paths.
 
 In the result, we can see the following query execution plan that CustomScan
 takes underlying scan plans.
 
 postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2;
 QUERY PLAN
 --
 
  Custom Scan (GpuHashJoin)  (cost=2968.00..140120.31 rows=3970922
 width=143)
Hash clause 1: (aid = aid)
Hash clause 2: (bid = bid)
Bulkload: On
-  Custom Scan (GpuScan) on t0  (cost=500.00..57643.00 rows=409
 width=77)
-  Custom Scan (MultiHash)  (cost=734.00..734.00 rows=4
 width=37)
  hash keys: aid
  nBatches: 1  Buckets: 46000  Memory Usage: 99.99%
  -  Seq Scan on t1  (cost=0.00..734.00 rows=4 width=37)
  -  Custom Scan (MultiHash)  (cost=734.00..734.00 rows=4
 width=37)
hash keys: bid
nBatches: 1  Buckets: 46000  Memory Usage: 49.99%
-  Seq Scan on t2  (cost=0.00..734.00 rows=4
 width=37)
 (13 rows)
 
 Thanks,
 --
 NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
 kai...@ak.jp.nec.com
 
 
  -Original Message-
  From: Robert Haas [mailto:robertmh...@gmail.com]
  Sent: Thursday, January 15, 2015 2:07 AM
  To: Kaigai Kouhei(海外 浩平)
  Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada
  Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS]
  [v9.5] Custom Plan API)
 
  On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai kai...@ak.jp.nec.com
 wrote:
   When custom-scan node replaced a join-plan, it shall have at least
   two child plan-nodes. The callback handler of PlanCustomPath needs
   to be able to call create_plan_recurse() to transform the underlying
   path-nodes to plan-nodes, because this custom-scan node may take
   other built-in scan or sub-join nodes as its inner/outer input.
   In case of FDW, it shall kick any underlying scan relations to
   remote side, thus we may not expect ForeignScan has underlying plans...
 
  Do you 

[HACKERS] NOT NULL markings for BKI columns

2015-02-15 Thread Andres Freund
Hi,

8b38a538c0aa5a432dacd90f10805dc667a3d1a0 changed things so that all
columns are checked for NOT NULL ness. Which is fine in general, but it
currently does make it impossible to have a varlena column (except
OID/INT2 vector...) as a key column in syscache. Even if the column is
factually NOT NUL.

The rule for determining attribute's NOT NULL setting in bootstrap is:
/*
 * Mark as not null if type is fixed-width and prior columns are too.
 * This corresponds to case where column can be accessed directly via C
 * struct declaration.
 *
 * oidvector and int2vector are also treated as not-nullable, even 
though
 * they are no longer fixed-width.
 */
#define MARKNOTNULL(att) \
((att)-attlen  0 || \
 (att)-atttypid == OIDVECTOROID || \
 (att)-atttypid == INT2VECTOROID)

if (MARKNOTNULL(attrtypes[attnum]))
{
int i;

for (i = 0; i  attnum; i++)
{
if (!MARKNOTNULL(attrtypes[i]))
break;
}
if (i == attnum)
attrtypes[attnum]-attnotnull = true;
}
(the rule is also encoded in genbki.pl)

Now, you can argue that it's a folly to use the syscache code to access
something via a text column (which is what I want to do). I don't agree,
but even if you're of that position, it seems worthwhile to mark further
catalog columns as NOT NULL in the catalog if that's what the code
expects?

E.g. pg_(sh)?seclabel.provider should certainly be NOT
NULL. pg_extension.extversion as well. There's a couple more I think.

So, how about providing bootstrap infrastructure for marking columns as
NOT NULL?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mogrify and indent features for jsonb

2015-02-15 Thread Andrew Dunstan


On 02/15/2015 11:47 AM, Sehrope Sarkuni wrote:

For jsonb_indent, how about having it match up closer to the
JavaScript JSON.stringify(value, replacer, space)[1]? That way a user
can specify the indentation level and optionally filter the fields
they'd like to output.

In JS, the replacer parameter can be either a JS function or an
array of property names. I don't think the former is really possible
(in a SQL callable function) but the latter would be a text[]. The
space parameter can be either a string (used directly) or a number
(corresponding number of spaces).

The PG function signatures would be something like:

CREATE OR REPLACE FUNCTION jsonb_stringify(obj jsonb, replacer text[],
space text)
CREATE OR REPLACE FUNCTION jsonb_stringify(obj jsonb, replacer text[],
space int)

For convenience we could also include overloads with replacer removed
(since most people probably want the entire object):

CREATE OR REPLACE FUNCTION jsonb_stringify(obj jsonb, space text)
CREATE OR REPLACE FUNCTION jsonb_stringify(obj jsonb, space int)

Having json_stringify versions of these would be useful as well.




I think if you want these things, especially the filtering, you should 
probably load PLV8.


We could probably do the rest, but I'm not sure it's worth doing given 
that PLV8 is available for all of it.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOT NULL markings for BKI columns

2015-02-15 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2015-02-15 12:31:10 -0500, Tom Lane wrote:
 Where are you thinking of sticking that exactly, and will pgindent
 do something sane with it?

 Hm, I was thinking about
   /* extversion should never be null, but the others can be. */
   textextversion PG_FORCENOTNULL; /* extension version name */
 but pgindent then removes some of the space between text and extversion,
 making it
   text extversion PG_FORCENOTNULL;/* extension version name */
 an alternative where it doesn't do that is
   textPG_FORCENOTNULL(extversion);/* extension version 
 name */

 Not sure what's the best way here.

The former is clearly a lot more sane semantically, so I'd go with
that even if the whitespace is a bit less nice.

I notice that pgindent does a similar not-very-nice thing with
PG_USED_FOR_ASSERTS_ONLY.  I wonder if we could hack it to handle
those two identifiers specially?

BTW, the precedent of PG_USED_FOR_ASSERTS_ONLY would suggest calling
this one PG_FORCE_NOT_NULL, or at least using underscores for word
breaks in whatever we end up calling it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOT NULL markings for BKI columns

2015-02-15 Thread Andres Freund
On 2015-02-15 12:11:52 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  So, how about providing bootstrap infrastructure for marking columns as
  NOT NULL?
 
 We've not desperately needed it up to now, but if you can think of a clean
 representation, go for it.  I'd want to preserve the property that all
 columns accessible via C structs are automatically NOT NULL though; too
 much risk of breakage otherwise.

Agreed. I think that the noise of changing all existing attributes to
have the marker would far outweigh the cleanliness of being
consistent. I was thinking of adding BKI_FORCENOTNULL which would be
specified on the attributes you want it. The FORCE in there representing
that the default choice is overwritten.

On a first glance that doesn't look too hard.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOT NULL markings for BKI columns

2015-02-15 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 I was thinking of adding BKI_FORCENOTNULL which would be
 specified on the attributes you want it. The FORCE in there representing
 that the default choice is overwritten.

Where are you thinking of sticking that exactly, and will pgindent
do something sane with it?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-15 Thread Tom Lane
Kevin Grittner kgri...@ymail.com writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kgri...@ymail.com writes:
 What this patch does is add a GUC call old_snapshot_threshold.  It
 defaults to -1, which leaves behavior matching unpatched code.
 Above that it allows tuples to be vacuumed away after the number of
 transaction IDs specified by the GUC have been consumed.

 TBH, I'm not sure why we'd wish to emulate Oracle's single worst
 operational feature.

 I've run into cases where people have suffered horribly bloated
 databases because of one ill-behaved connection.  Some companies
 don't want to be vulnerable to that and the disruption that
 recovery from that bloat causes.

No doubt, preventing bloat is a good thing, but that doesn't mean this
is the best API we could create for the issue.  The proposition this
patch offers to DBAs is: You can turn this knob to reduce bloat by some
hard-to-quantify factor.  The cost is that some long-running transactions
might fail.  You won't know which ones are at risk, the failures won't be
the same from time to time, and you won't be able to do anything to spare
high-value transactions from that fate except by turning that knob back
again globally (which requires a database restart).  Maybe refugees from
Oracle will think that sounds good, but nobody else will.

I wonder if we couldn't achieve largely the same positive effects through
adding a simple transaction-level timeout option.  That would be far
easier for users to understand and manage, it would be trivial to allow
specific high-value transactions to run with a laxer setting, it does not
create any implementation-detail-dependent behavior that we'd be having to
explain to users forevermore, and (not incidentally) it would be a lot
simpler and more trustworthy to implement.  There's no well-defined
correlation between your setting and the net effect on database bloat,
but that's true with the snapshot too old approach as well.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] mogrify and indent features for jsonb

2015-02-15 Thread Sehrope Sarkuni
For jsonb_indent, how about having it match up closer to the
JavaScript JSON.stringify(value, replacer, space)[1]? That way a user
can specify the indentation level and optionally filter the fields
they'd like to output.

In JS, the replacer parameter can be either a JS function or an
array of property names. I don't think the former is really possible
(in a SQL callable function) but the latter would be a text[]. The
space parameter can be either a string (used directly) or a number
(corresponding number of spaces).

The PG function signatures would be something like:

CREATE OR REPLACE FUNCTION jsonb_stringify(obj jsonb, replacer text[],
space text)
CREATE OR REPLACE FUNCTION jsonb_stringify(obj jsonb, replacer text[],
space int)

For convenience we could also include overloads with replacer removed
(since most people probably want the entire object):

CREATE OR REPLACE FUNCTION jsonb_stringify(obj jsonb, space text)
CREATE OR REPLACE FUNCTION jsonb_stringify(obj jsonb, space int)

Having json_stringify versions of these would be useful as well.

Regards,
-- Sehrope Sarkuni

[1]: 
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOT NULL markings for BKI columns

2015-02-15 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 So, how about providing bootstrap infrastructure for marking columns as
 NOT NULL?

We've not desperately needed it up to now, but if you can think of a clean
representation, go for it.  I'd want to preserve the property that all
columns accessible via C structs are automatically NOT NULL though; too
much risk of breakage otherwise.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] NOT NULL markings for BKI columns

2015-02-15 Thread Andres Freund
On 2015-02-15 12:31:10 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  I was thinking of adding BKI_FORCENOTNULL which would be
  specified on the attributes you want it. The FORCE in there representing
  that the default choice is overwritten.
 
 Where are you thinking of sticking that exactly, and will pgindent
 do something sane with it?

Hm, I was thinking about
/* extversion should never be null, but the others can be. */
textextversion PG_FORCENOTNULL; /* extension version name */
but pgindent then removes some of the space between text and extversion,
making it
text extversion PG_FORCENOTNULL;/* extension version name */
an alternative where it doesn't do that is
textPG_FORCENOTNULL(extversion);/* extension version 
name */

Not sure what's the best way here.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gcc5: initdb produces gigabytes of _fsm files

2015-02-15 Thread Tom Lane
Christoph Berg c...@df7cb.de writes:
 gcc5 is lurking in Debian experimental, and it's breaking initdb.

FYI, this is now fixed in Red Hat's rawhide version:
https://bugzilla.redhat.com/show_bug.cgi?id=1190978

Don't know what the update process is like for Debian's copy, but
maybe you could pester the appropriate people to absorb the referenced
upstream fix quickly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] restrict global access to be readonly

2015-02-15 Thread Florian Pflug
On Feb15, 2015, at 10:13 , David G Johnston david.g.johns...@gmail.com wrote:
 happy times wrote
 Sure, we can utilize the   runtime parameter
 default_transaction_read_only, however, it does not restrict user from
 changing transaction attribute to non-readonly mode, so is not safe. 
 
 ISTM that implementing a means to make this setting only super-user
 changeable would be a quick(er) solution to the problem - the decision as to
 what to disallow is already decided.
 
 It seems like it would have to be a separate GUC but it would require any
 new SQL but would simply leverage the existing settings system to setup
 database-user values that only a super-user can change.

I've wished for a way prevent regular users for changing specific settings
in the past. Maybe we could have 

  ALTER DATABASE database FORCE parameter TO value
  ALTER ROLE role [ IN DATABASE db ] FORCE parameter TO value

In postgresql.conf, we could use a syntax like

  parameter =!= value

to indicate that the parameter value can only be changed by super-users.

We' have to figure out what happens if a database- or role-specific
FORCEd setting attempts to override a value already FORCEd in postgresql.conf.

Ideally, we'd allow database- or role-specific settings created by super-users
to override previously FORCEd values, but that would require us to store the
role that creates such settings in pg_db_role_setting. For SET clauses attached
to functions, we'd complain if they attempt to change a FORCEd value, unless
they are called by a super-user, or are marked SECURITY DEFINER and owned by
a super-user.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] restrict global access to be readonly

2015-02-15 Thread Andres Freund
On 2015-02-14 17:28:38 -0600, Jim Nasby wrote:
 Throw an error in AssignTransactionId/GetNewTransactionId? I see 4 calls to
 Get*TransactionId in logical replication, though arguably if we're fixing
 that we should look at doing something special for Slony and the
 likes.

I don't think there are any xid assignments in the logical decoding
code. There's a couple error checks erroring out if an xid has been
assigned, but those use GetTopTransactionIdIfAny(), i.e. don't assign an
id.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New CF app deployment

2015-02-15 Thread Magnus Hagander
On Sat, Feb 14, 2015 at 6:42 PM, Robert Haas robertmh...@gmail.com wrote:

 On Sat, Feb 14, 2015 at 7:29 AM, Magnus Hagander mag...@hagander.net
 wrote:
  Ok, I've pushed an attempt at doing this.
 
  For each mailthread, you can now create annotations. Each annotation is
  connected to a mail in the thread, and has a free text comment field. The
  message will then automatically be highlighted out of the archives and
 a
  direct link to the message include, alongside the text comment.
 
  I *think* this is approximately what people wanted, so let's give it a
 shot.
  If you have a chance, please test it out and comment as soon as you can
 - if
  I'm completely off track with how it's done, we should back it out and
 try a
  different way before we start putting actual valuable data into it, so we
  don't end up with multiple different ways of doing it in the end...

 Hmm.  This kind of looks like the right idea, but it's hard to use,
 because all you've got to work with is the subject of the message
 (which is the same for all), the time it was sent, and the author.  In
 the old system, you could look at the message in the archives and then
 copy-and-paste in the message ID.  That's obviously a bit manual, but
 it worked.


Hmm. Agreed, I think my test-thread was just too short to do that.

I've added a box where you can copy/paste the messageid. Back to the
manual, but it works.




 I suppose what would be really spiffy is if the integration of the
 archives and the CF app could be such that you could see the whole
 message and then click annotate this message.



That would be quite spiffy. A bit more work though, so that will have to go
on the longer term todo :)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

2015-02-15 Thread Tom Lane
Here's an updated version of the patch I sent before.  Notable changes:

* I switched over to calling deserialized objects expanded objects,
and the default representation is now called flat or flattened instead
of reserialized.  Per suggestion from Robert.

* I got rid of the bit about detecting read-write pointers by address
comparison.  Instead there are now two vartag values for R/W and R/O
pointers.  After further reflection I concluded that my previous worry
about wanting copied pointers to automatically become read-only was
probably wrong, so there's no need for extra confusion here.

* I added support for extracting array elements from expanded values
(array_ref).

* I hacked plpgsql to force values of array-type variables into expanded
form; this is needed to get any win from the array_ref change if the
function doesn't do any assignments to elements of the array.  This is an
improvement over the original patch, which hardwired array_set to force
expansion, but I remain unsatisfied with it as a long-term answer.  It's
not clear that it's always a win to do this (but the tradeoff will change
as we convert more array support functions to handle expanded inputs, so
it's probably not worth getting too excited about that aspect of it yet).
A bigger complaint is that this approach cannot fix things for non-builtin
types such as hstore.  I'm hesitant to add a pg_type column carrying an
expansion function OID, but there may be no other workable answer for
extension types.

The patch as it stands is able to do nice things with

create or replace function arraysetnum(n int) returns numeric[] as $$
declare res numeric[] := '{}';
begin
  for i in 1 .. n loop
res[i] := i;
  end loop;
  return res;
end
$$ language plpgsql strict;

create or replace function arraysumnum(arr numeric[]) returns numeric as $$
declare res numeric := 0;
begin
  for i in array_lower(arr, 1) .. array_upper(arr, 1) loop
res := res + arr[i];
  end loop;
  return res;
end
$$ language plpgsql strict;

regression=# select arraysumnum(arraysetnum(10));
 arraysumnum 
-
  55
(1 row)

Time: 304.336 ms

(versus approximately 1 minute in 9.4, although these numbers are for
cassert builds so should be taken with a grain of salt.)  There are
still a couple more flattening/expansion conversions than I'd like,
in particular the array returned by arraysetnum() gets flattened on its
way out, which would be good to avoid.

I'm going to stick this into the commitfest even though it's not really
close to being committable; I see some other people doing likewise with
their pet patches ;-).  What it could particularly do with some reviewing
help on is exploring the performance changes it creates; what cases does
it make substantially worse?

regards, tom lane

diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index 867035d..860ad78 100644
*** a/src/backend/access/common/heaptuple.c
--- b/src/backend/access/common/heaptuple.c
***
*** 60,65 
--- 60,66 
  #include access/sysattr.h
  #include access/tuptoaster.h
  #include executor/tuptable.h
+ #include utils/expandeddatum.h
  
  
  /* Does att's datatype allow packing into the 1-byte-header varlena format? */
*** heap_compute_data_size(TupleDesc tupleDe
*** 93,105 
  	for (i = 0; i  numberOfAttributes; i++)
  	{
  		Datum		val;
  
  		if (isnull[i])
  			continue;
  
  		val = values[i];
  
! 		if (ATT_IS_PACKABLE(att[i]) 
  			VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
  		{
  			/*
--- 94,108 
  	for (i = 0; i  numberOfAttributes; i++)
  	{
  		Datum		val;
+ 		Form_pg_attribute atti;
  
  		if (isnull[i])
  			continue;
  
  		val = values[i];
+ 		atti = att[i];
  
! 		if (ATT_IS_PACKABLE(atti) 
  			VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
  		{
  			/*
*** heap_compute_data_size(TupleDesc tupleDe
*** 108,118 
  			 */
  			data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val));
  		}
  		else
  		{
! 			data_length = att_align_datum(data_length, att[i]-attalign,
! 		  att[i]-attlen, val);
! 			data_length = att_addlength_datum(data_length, att[i]-attlen,
  			  val);
  		}
  	}
--- 111,131 
  			 */
  			data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val));
  		}
+ 		else if (atti-attlen == -1 
+  VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
+ 		{
+ 			/*
+ 			 * we want to flatten the expanded value so that the constructed
+ 			 * tuple doesn't depend on it
+ 			 */
+ 			data_length = att_align_nominal(data_length, atti-attalign);
+ 			data_length += EOH_get_flat_size(DatumGetEOHP(val));
+ 		}
  		else
  		{
! 			data_length = att_align_datum(data_length, atti-attalign,
! 		  atti-attlen, val);
! 			data_length = att_addlength_datum(data_length, atti-attlen,
  			  val);
  		}
  	}
*** heap_fill_tuple(TupleDesc tupleDesc,
*** 203,212 
  			*infomask |= 

Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

2015-02-15 Thread Robert Haas
On Sun, Feb 15, 2015 at 6:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm going to stick this into the commitfest even though it's not really
 close to being committable; I see some other people doing likewise with
 their pet patches ;-).  What it could particularly do with some reviewing
 help on is exploring the performance changes it creates; what cases does
 it make substantially worse?

It's perfectly reasonable to add stuff that isn't committable yet to
the CF app; the point of the CF app is to track what needs reviewing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Really bad blowups with hash outer join and nulls

2015-02-15 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

  A quick test suggests that initializing the hash value to ~0 rather
  than 0 has a curious effect: the number of batches still explodes,
  but the performance does not suffer the same way. (I think because
  almost all the batches end up empty.) I think this is worth doing
  even in the absence of a more general solution; nulls are common
  enough and important enough that they shouldn't be the worst-case
  value if it can be avoided.

 Tom I think that's unlikely to be a path to a good solution.

It wasn't really intended to be.

 Tom At least part of the problem here is that
 Tom estimate_hash_bucketsize() supposes that nulls can be ignored ---
 Tom which is normally true, and invalidates your claim that they're
 Tom common.  But in a RIGHT JOIN situation, they need to be considered
 Tom as if they were regular keys.  That would probably be sufficient
 Tom to dissuade the planner from choosing a hash join in this example.

I've now tried the attached patch to correct the bucketsize estimates,
and it does indeed stop the planner from considering the offending path
(in this case it just does the join the other way round).

One thing I couldn't immediately see how to do was account for the case
where there are a lot of nulls in the table but a strict qual (or an IS
NOT NULL) filters them out; this patch will be overly pessimistic about
such cases. Do estimates normally try and take things like this into
account? I didn't find any other relevant examples.

 Tom There may also be something we can do in the executor, but it
 Tom would take closer analysis to figure out what's going wrong.  I
 Tom don't think kluging the behavior for NULL in particular is the
 Tom answer.

The point with nulls is that a hash value of 0 is currently special in
two distinct ways: it's always in batch 0 and bucket 0 regardless of how
many batches and buckets there are, and it's the result of hashing a
null.  These two special cases interact in a worst-case manner, so it
seems worthwhile to avoid that.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 020558b..1723fd3 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -2505,6 +2505,7 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
 	SpecialJoinInfo *sjinfo,
 	SemiAntiJoinFactors *semifactors)
 {
+	JoinType	jointype = path-jpath.jointype;
 	Path	   *outer_path = path-jpath.outerjoinpath;
 	Path	   *inner_path = path-jpath.innerjoinpath;
 	double		outer_path_rows = outer_path-rows;
@@ -2560,6 +2561,9 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
 		foreach(hcl, hashclauses)
 		{
 			RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(hcl);
+			bool		keep_nulls = (jointype == JOIN_RIGHT
+	  || jointype == JOIN_FULL
+	  || !op_strict(restrictinfo-hashjoinoperator));
 			Selectivity thisbucketsize;
 
 			Assert(IsA(restrictinfo, RestrictInfo));
@@ -2583,7 +2587,8 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
 	thisbucketsize =
 		estimate_hash_bucketsize(root,
 		   get_rightop(restrictinfo-clause),
- virtualbuckets);
+ virtualbuckets,
+ keep_nulls);
 	restrictinfo-right_bucketsize = thisbucketsize;
 }
 			}
@@ -2599,7 +2604,8 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path,
 	thisbucketsize =
 		estimate_hash_bucketsize(root,
 			get_leftop(restrictinfo-clause),
- virtualbuckets);
+ virtualbuckets,
+ keep_nulls);
 	restrictinfo-left_bucketsize = thisbucketsize;
 }
 			}
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 1ba103c..e8660bc 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3435,9 +3435,13 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows)
  * discourage use of a hash rather strongly if the inner relation is large,
  * which is what we want.  We do not want to hash unless we know that the
  * inner rel is well-dispersed (or the alternatives seem much worse).
+ *
+ * If keep_nulls is true, then we have to treat nulls as a real value, so
+ * adjust all results according to stanullfrac if available.
  */
 Selectivity
-estimate_hash_bucketsize(PlannerInfo *root, Node *hashkey, double nbuckets)
+estimate_hash_bucketsize(PlannerInfo *root, Node *hashkey, double nbuckets,
+		 bool keep_nulls)
 {
 	VariableStatData vardata;
 	double		estfract,
@@ -3454,13 +3458,6 @@ estimate_hash_bucketsize(PlannerInfo *root, Node *hashkey, double nbuckets)
 	/* Get number of distinct values */
 	ndistinct = get_variable_numdistinct(vardata, isdefault);
 
-	/* If ndistinct isn't real, punt and return 0.1, per comments above */
-	if (isdefault)
-	{
-		ReleaseVariableStats(vardata);
-		return (Selectivity) 0.1;
-	}
-
 	/* Get fraction that 

Re: [HACKERS] Replication identifiers, take 4

2015-02-15 Thread Andres Freund
Hi,

Here's my next attept attempt at producing something we can agree
upon.

The major change that might achieve that is that I've now provided a
separate method to store the origin_id of a node. I've made it
conditional on !REPLICATION_IDENTIFIER_REUSE_PADDING, to show both
paths. That new method uses Heikki's xlog rework to dynamically add the
origin to the record if a origin is set up. That works surprisingly
simply.

Other changes:

* Locking preventing several backends to replay changes at the same
  time. This is actually overly restrictive in some cases, but I think
  good enough for now.
* Logical decoding grew a filter_by_origin callback that allows to
  ignore changes that were replayed on a remote system. Such filters are
  executed before much is done with records, potentially saving a fair
  bit of costs.
* Rebased. That took a bit due the xlog and other changes.
* A couple more SQL interface functions (like dropping a replication
  identifier).

I also want to quickly recap replication identifiers, given that
in-person conversations with several people proved that the concept was
slightly misunderstood:

Think about a logical replication solution trying to replay changes. The
postmaster in which the data is replayed into crashes every now and
then. Replication identifiers allow you to do something like:

do_replication()
{
source = ConnectToSourceSystem('mysource');
target = ConnectToSourceSystem('target');

# mark we're replayin
target.exec($$SELECT 
pg_replication_identifier_setup_replaying_from('myrep_mysource')$$);
# get how far we've replayed last time round
remote_lsn = target.exec($$SELECT remote_lsn FROM  
pg_get_replication_identifier_progress WHERE external_id = 'myrep_mysource');

# and now replay changes
copystream = source.exec('START_LOGICAL_REPLICATION SLOT ... START %x', 
remote_lsn);

while (record = copystream.get_record())
{
   if (record.type = 'begin')
   {
  target.exec('BEGIN');
  # setup the position of this individual xact
  target.exec('SELECT pg_replication_identifier_setup_tx_origin($1, 
$2);',
  record.origin_lsn, record.origin_commit_timestamp);
   }
   else if (record.type = 'change')
  target.exec(record.change_sql)
   else if (record.type = 'commit')
  target.exec('COMMIT');
}
}

A non pseudocode version of the above would be safe against crashes of
both the source and the target system. If the target system crashes the
replication identifier logic will recover how far we replayed during
crash recovery. If the source system crashes/disconnects we'll have the
current value in memory. Note that this works perfectly well if the
target system (and obviously the source system, but that's obvious) use
synchronous_commit = off - we'll not miss any changes.

Furthermore the fact that the origin of records is recorded allows to
avoid decoding them in logical decoding. That has both efficiency
advantages (we can do so before they are stored in memory/disk) and
functionality advantages. Imagine using a logical replication solution
to replicate inserts to a single table between two databases where
inserts are allowed on both - unless you prevent the replicated inserts
from being replicated again you obviously have a loop. This
infrastructure lets you avoid that.

The SQL interface consists out of:
# manage existance of identifiers
internal_id pg_replication_identifier_create(external_id);
void pg_replication_identifier_drop(external_id);

# replay management
void pg_replication_identifier_setup_replaying_from(external_id);
void pg_replication_identifier_reset_replaying_from();
bool pg_replication_identifier_is_replaying();
void pg_replication_identifier_setup_tx_origin(remote_lsn, remote_commit_time);

# replication progress status view
SELECT * FROM pgreplication_identifier_progress;

# replicatation identifiers
SELECT * FROM pg_replication_identifier;


Petr has developed (for UDR, i.e. logical replication ontop of 9.4) a
SQL reimplementation of replication identifiers and that has proven that
for busier workloads doing a table update to store the replication
progress indeed has a noticeable overhead. Especially if there's some
longer running activity on the standby.

The bigger questions I have are:

1) Where to store the origin. I personally still think that using the
   padding is fine. Now that I have proven that it's pretty simple to
   store additional information the argument that it might be needed for
   something else doesn't really hold anymore. But I can live with the
   other solution as well - 3 bytes additional overhead ain't so bad.

2) If we go with the !REPLICATION_IDENTIFIER_REUSE_PADDING solution, do
   we want to store the origin only on relevant records? That'd be
   XLOG_HEAP_INSERT/XLOG_HEAPMULTI_INSERT/XLOG_HEAP_UPDATE //
   XLOG_XACT_COMMIT/XLOG_XACT_COMMIT_PREPARED. I'm thinking of something
   

Re: [HACKERS] Replication identifiers, take 4

2015-02-15 Thread Andres Freund
On 2015-02-16 01:21:55 +0100, Andres Freund wrote:
 Here's my next attept attempt at producing something we can agree
 upon.
 
 The major change that might achieve that is that I've now provided a
 separate method to store the origin_id of a node. I've made it
 conditional on !REPLICATION_IDENTIFIER_REUSE_PADDING, to show both
 paths. That new method uses Heikki's xlog rework to dynamically add the
 origin to the record if a origin is set up. That works surprisingly
 simply.

 Other changes:
 
 * Locking preventing several backends to replay changes at the same
   time. This is actually overly restrictive in some cases, but I think
   good enough for now.
 * Logical decoding grew a filter_by_origin callback that allows to
   ignore changes that were replayed on a remote system. Such filters are
   executed before much is done with records, potentially saving a fair
   bit of costs.
 * Rebased. That took a bit due the xlog and other changes.
 * A couple more SQL interface functions (like dropping a replication
   identifier).

And here an actual patch.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 268d52cac6bf7fe1c019fd68248853c7c7ae18b1 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Mon, 16 Feb 2015 01:22:08 +0100
Subject: [PATCH] Introduce replication identifiers to keep track of
 replication progress: v0.6

---
 contrib/test_decoding/Makefile |3 +-
 contrib/test_decoding/expected/replident.out   |   84 ++
 contrib/test_decoding/sql/replident.sql|   40 +
 contrib/test_decoding/test_decoding.c  |   28 +
 src/backend/access/rmgrdesc/xactdesc.c |   17 +-
 src/backend/access/transam/xact.c  |   64 +-
 src/backend/access/transam/xlog.c  |   34 +-
 src/backend/access/transam/xloginsert.c|   22 +-
 src/backend/access/transam/xlogreader.c|   10 +
 src/backend/bootstrap/bootstrap.c  |5 +-
 src/backend/catalog/Makefile   |2 +-
 src/backend/catalog/catalog.c  |8 +-
 src/backend/catalog/system_views.sql   |7 +
 src/backend/replication/logical/Makefile   |3 +-
 src/backend/replication/logical/decode.c   |   63 +-
 src/backend/replication/logical/logical.c  |5 +
 src/backend/replication/logical/reorderbuffer.c|5 +-
 .../replication/logical/replication_identifier.c   | 1190 
 src/backend/storage/ipc/ipci.c |3 +
 src/backend/utils/cache/syscache.c |   23 +
 src/backend/utils/misc/guc.c   |1 +
 src/bin/initdb/initdb.c|1 +
 src/bin/pg_resetxlog/pg_resetxlog.c|3 +
 src/include/access/xact.h  |   10 +-
 src/include/access/xlog.h  |1 +
 src/include/access/xlogdefs.h  |6 +
 src/include/access/xlogreader.h|9 +
 src/include/access/xlogrecord.h|5 +-
 src/include/catalog/indexing.h |6 +
 src/include/catalog/pg_proc.h  |   28 +
 src/include/catalog/pg_replication_identifier.h|   75 ++
 src/include/pg_config_manual.h |6 +
 src/include/replication/output_plugin.h|8 +
 src/include/replication/reorderbuffer.h|8 +-
 src/include/replication/replication_identifier.h   |   58 +
 src/include/utils/syscache.h   |2 +
 src/test/regress/expected/rules.out|5 +
 src/test/regress/expected/sanity_check.out |1 +
 38 files changed, 1816 insertions(+), 33 deletions(-)
 create mode 100644 contrib/test_decoding/expected/replident.out
 create mode 100644 contrib/test_decoding/sql/replident.sql
 create mode 100644 src/backend/replication/logical/replication_identifier.c
 create mode 100644 src/include/catalog/pg_replication_identifier.h
 create mode 100644 src/include/replication/replication_identifier.h

diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index 438be44..f8334cc 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -37,7 +37,8 @@ submake-isolation:
 submake-test_decoding:
 	$(MAKE) -C $(top_builddir)/contrib/test_decoding
 
-REGRESSCHECKS=ddl rewrite toast permissions decoding_in_xact decoding_into_rel binary prepared
+REGRESSCHECKS=ddl rewrite toast permissions decoding_in_xact decoding_into_rel \
+	binary prepared replident
 
 regresscheck: all | submake-regress submake-test_decoding
 	$(MKDIR_P) regression_output
diff --git a/contrib/test_decoding/expected/replident.out b/contrib/test_decoding/expected/replident.out
new file mode 100644
index 000..1c508a5
--- /dev/null
+++ 

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-15 Thread Peter Geoghegan
On Sat, Feb 14, 2015 at 2:06 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Thanks for taking a look at it. That's somewhat cleaned up in the
 attached patchseries - V2.2. This has been rebased to repair the minor
 bit-rot pointed out by Thom.

 I don't really have the energy to review this patch in one batch, so I'd
 really like to split this into two:

I think we're all feeling worn out at this point, by this patch and by
others. I do appreciate your making the effort.

 1. Solve the existing problem with exclusion constraints that if two
 insertions happen concurrently, one of them might be aborted with deadlock
 detected error, rather then conflicting key violation error. That really
 wouldn't be worth fixing otherwise, but it happens to be a useful stepping
 stone for this upsert patch.

 2. All the rest.

I think we need a more pragmatic approach to dealing with the
exclusion constraint problems.

 I took a stab at extracting the parts needed to do 1. See attached. I didn't
 modify ExecUpdate to use speculative insertions, so the issue remains for
 UPDATEs, but that's easy to add.

Cool.

 I did not solve the potential for livelocks (discussed at
 http://www.postgresql.org/message-id/cam3swztftt_fehet3tu3ykcpcypynnauquz3q+naasnh-60...@mail.gmail.com).
 The patch always super-deletes the already-inserted tuple on conflict, and
 then waits for the other inserter. That would be nice to fix, using one of
 the ideas from that thread, or something else.

How about we don't super-delete at all with exclusion constraints? I'd
be willing to accept unprincipled deadlocks for exclusion constraints,
because they already exist today for UPSERT/NOSERT type use cases, and
with idiomatic usage seem far less likely for the IGNORE variant
(especially with exclusion constraints). I can see people using ON
CONFLICT UPDATE where they'd almost or actually be better off using a
plain UPDATE - that's quite a different situation. I find livelocks to
be a *very* scary prospect, and I don't think the remediations that
were mentioned are going to fly. It's just too messy, and too much of
a niche use case. TBH I am skeptical of the idea that we can fix
exclusion constraints properly in this way at all, at least not
without the exponential backoff thing, which just seems horrible.

 We never really debated the options for how to do the speculative insertion
 and super-deletion. This patch is still based on the quick  dirty demo
 patches I posted about a year ago, in response to issues you found with
 earlier versions. That might be OK - maybe I really hit the nail on
 designing those things and got them right on first try - but more likely
 there are better alternatives.

Intuitively, it seem likely that you're right here. However, it was
necessary to work through the approach to see what the problems are.
For example, the need for modifications to tqual.c became apparent
only through putting a full implementation of ON CONFLICT UPDATE
through various tests. In general, I've emphasized that the problems
with any given value locking implementation are non-obvious. Anyone
who thinks that he can see all the problems with his approach to value
locking without having a real implementation that is tested for
problems like unprincipled deadlocks is probably wrong.

That's sort of where I'm coming from with suggesting we allow
unprincipled deadlocks with exclusion constraints. I'm not confident
that we can find a perfect solution, and know that it's a perfect
solution. It's too odd, and too niche a requirement. Although you
understood what I was on about when I first talked about unprincipled
deadlocks, I think that acceptance of that idea came much later from
other people, because it's damn complicated. It's not worth it to add
some weird Dining philosophers exponential backoff thing to make
sure that the IGNORE variant when used with exclusion constraints can
never deadlock in an unprincipled way, but if it is worth it then it
seems unreasonable to suppose that this patch needs to solve that
pre-existing problem. No?

If we do something like an exponential backoff, which I think might
work, I fear that that kind of yak-shaving will leave us with
something impossible to justify committing; a total mess. Better the
devil you know (possible unprincipled deadlocks with the IGNORE
variant + exclusion constraints).

 Are we happy with acquiring the SpeculativeInsertLock heavy-weight lock for
 every insertion? That seems bad for performance reasons. Also, are we happy
 with adding the new fields to the proc array? Another alternative that was
 discussed was storing the speculative insertion token on the heap tuple
 itself. (See
 http://www.postgresql.org/message-id/52d00d2d.6030...@vmware.com)

Whatever works, really. I can't say that the performance implications
of acquiring that hwlock are at the forefront of my mind. I never
found that to be a big problem on an 8 core box, relative to vanilla
INSERTs, FWIW - lock contention is not 

Re: [HACKERS] Issue installing doc tools on OSX

2015-02-15 Thread Peter Eisentraut
On 2/15/15 6:31 PM, David Steele wrote:
 I had a problem installing the doc tools following the directions for
 OSX at http://www.postgresql.org/docs/9.4/static/docguide-toolsets.html.
  I'm running OSX Yosemite.
 
 I got it to work by changing the command from:
 
 sudo port install docbook-dsssl docbook-sgml-4.2 docbook-xml-4.2
 docbook-xsl libxslt openjade opensp
 
 To:
 
 sudo port install docbook-dsssl docbook-sgml-4.2 docbook-xml-4.2
 docbook-xsl libxslt opensp openjade
 
 I didn't capture the error message unfortunately, but it was more or
 less: unresolved dependency opensp while installing openjade.

That seems a bit incredible, since port should be able to resolve the
dependencies by itself.  I suggest that this should be reported as a bug
to MacPorts.

Also note that the other listed packages are also not ordered in
dependency order.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New CF app deployment

2015-02-15 Thread Peter Eisentraut
On 2/14/15 7:30 AM, Magnus Hagander wrote:
 On Mon, Feb 9, 2015 at 4:56 PM, Robert Haas robertmh...@gmail.com
 Can we make it smarter, so that the kinds of things people produce
 intending for them to be patches are thought by the CF app to be
 patches?
 
 
 Doing it wouldn't be too hard, as the code right now is simply:
 
 # Attempt to identify the file using magic information
 mtype = mag.buffer(contents)
 if mtype.startswith('text/x-diff'):
 a.ispatch = True
 else:
 a.ispatch = False
 
 
 (where mag is the API call into the magic module)
 
 So we could easily add for example our own regexp parsing or so. The
 question is do we want to - because we'll have to maintain it as well.
 But I guess if we have a restricted enough set of rules, we can probably
 live with that.

As I had described in
http://www.postgresql.org/message-id/54dd2413.8030...@gmx.net, this is
all but impossible.  The above rule is certainly completely detached
from the reality of what people actually send in.  If you are just
ignoring patches that don't match your rule set, this is not going to
work very well.

I think the old system where the patch submitter declared, this message
contains my patch, is the only one that will work.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] restrict global access to be readonly

2015-02-15 Thread Peter Eisentraut
On 2/14/15 7:24 PM, Tom Lane wrote:
 Another possibility that would be attractive for replication-related
 use-cases would be nothing that generates WAL thank you very much.

This would be useful, as it essentially simulates a hot standby.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Really bad blowups with hash outer join and nulls

2015-02-15 Thread Tomas Vondra
Hi,

On 16.2.2015 00:50, Andrew Gierth wrote:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 I've now tried the attached patch to correct the bucketsize
 estimates, and it does indeed stop the planner from considering the
 offending path (in this case it just does the join the other way
 round).
 
 One thing I couldn't immediately see how to do was account for the
 case where there are a lot of nulls in the table but a strict qual
 (or an IS NOT NULL) filters them out; this patch will be overly
 pessimistic about such cases. Do estimates normally try and take
 things like this into account? I didn't find any other relevant
 examples.

Improving the estimates is always good, but it's not going to fix the
case of non-NULL values (it shouldn't be all that difficult to create
such examples with a value whose hash starts with a bunch of zeroes).

  Tom There may also be something we can do in the executor, but it
  Tom would take closer analysis to figure out what's going wrong.  I
  Tom don't think kluging the behavior for NULL in particular is the
  Tom answer.
 
 The point with nulls is that a hash value of 0 is currently special
 in two distinct ways: it's always in batch 0 and bucket 0 regardless
 of how many batches and buckets there are, and it's the result of
 hashing a null. These two special cases interact in a worst-case
 manner, so it seems worthwhile to avoid that.

I think you're right this is a flaw in general - all it takes is a
sufficiently common value with a hash value falling into the first batch
(i.e. either 0 or starting with a lot of 0 bits, thus giving hashno==0).

I think this might be solved by relaxing the check a bit. Currently we
do this (see nodeHash.c:735):

if (nfreed == 0 || nfreed == ninmemory)
{
hashtable-growEnabled = false;
}

which only disables nbatch growth if *all* the tuples remain in the
batch. That's rather strict, and it takes a single tuple to break this.

With each nbatch increase we expect 50:50 split, i.e. 1/2 the tuples
staying in the batch, 1/2 moved to the new one. So a much higher ratio
is suspicious and most likely mean the same hash value, so what if we
did something like this:

if ((nfreed = 0.9 * (nfreed + ninmemory)) ||
(nfreed = 0.1 * (nfreed + ninmemory)))
{
hashtable-growEnabled = false;
}

which disables nbatch growth if either =90% tuples remained in the
first batch, or = 90% tuples were moved from it. The exact thresholds
might be set a bit differently, but I think 10:90 sounds about good.

Trivial patch implementing this attached - with it the explain analyze
looks like this:

test=# explain analyze select status, count(*) from q3 left join m3 on
(m3.id=q3.id) group by status;

   QUERY PLAN
--
 HashAggregate  (cost=64717.63..64717.67 rows=4 width=4) (actual
time=514.619..514.630 rows=5 loops=1)
   Group Key: m3.status
   -  Hash Right Join  (cost=18100.00..62217.63 rows=50 width=4)
(actual time=75.260..467.911 rows=500108 loops=1)
 Hash Cond: (m3.id = q3.id)
 -  Seq Scan on m3  (cost=0.00..18334.00 rows=100 width=37)
(actual time=0.003..91.799 rows=100 loops=1)
 -  Hash  (cost=7943.00..7943.00 rows=50 width=33)
   (actual time=74.916..74.916 rows=50 loops=1)
   Buckets: 65536 (originally 65536)
   Batches: 32 (originally 16)  Memory Usage: 8824kB
   -  Seq Scan on q3
   (cost=0.00..7943.00 rows=50 width=33)
   (actual time=0.005..27.395 rows=50 loops=1)
 Planning time: 0.172 ms
 Execution time: 514.663 ms
(10 rows)

Without the patch this runs in ~240 seconds and the number of batches
explodes to ~131k.

In theory it might happen that threre just a few hash values and all of
them are exactly the same within the first N bits (thus falling into the
first batch), but N+1 bits are different. So the next split would
actually help. But I think we can leave that up to probability, just
like all the hash code.


-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services
diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
index abd70b3..6b3f471 100644
--- a/src/backend/executor/nodeHash.c
+++ b/src/backend/executor/nodeHash.c
@@ -732,7 +732,8 @@ ExecHashIncreaseNumBatches(HashJoinTable hashtable)
 	 * group any more finely. We have to just gut it out and hope the server
 	 * has enough RAM.
 	 */
-	if (nfreed == 0 || nfreed == ninmemory)
+if ((nfreed = 0.9 * (nfreed + ninmemory)) || 
+(nfreed = 0.1 * (nfreed + ninmemory)))
 	{
 		hashtable-growEnabled = false;
 #ifdef HJDEBUG

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

[HACKERS] EXPERIMENTAL: mmap-based memory context / allocator

2015-02-15 Thread Tomas Vondra
Hi!

While playing with the memory context internals some time ago, I've been
wondering if there are better ways to allocate memory from the kernel -
either tweaking libc somehow, or maybe interacting with kernel directly.

I mostly forgot about that topic, but after the local conference last
week we went to a pub and one of the things we discussed over a beer was
how complex and unintuitive the memory stuff is, because of the libc
heuristics, 'sbrk' properties [1] and behavior in the presence of holes,
OOM etc.

The virtual memory system should handle this to a large degree, but I've
repeatedly ran into problem when that was not the case (for example the
memory is still part of the virtual address space, and thus counted by OOM).

One of the wilder ideas (I mentined beer was involved!) was a memory
allocator based on mmap [2], bypassing the libc malloc implementation
altogether. mmap() has some nice features (e.g. no issues with returning
memory back to the kernel, which may be problem with sbrk). So I hacked
a bit and switched the AllocSet implementation to mmap().

And it works surprisingly well, so here is an experimental patch for
comments whether this really is a good idea etc. Some parts of the patch
are a bit dirty and I've only tested it on x86.


Notes
-

(1) The main changes are mostly trivial, rewriting malloc()/free() to
mmap()/munmap(), plus related chages (e.g. mmap() returns (void*)-1
instead of NULL in case of failure).

(2) A significant difference is that mmap() can't allocate blocks
smaller than page size, which is 4kB on x86. This means the
smallest context is 4kB (instead of 1kB), and also affects the
growth of block size (it can only grow to 8kB). So this changes
the AllocSet internal behavior a bit.

(3) As this bypasses libc, it can't use the libc freelists (which are
used by malloc). To compensate for this, there's a simple
process-level freelist of blocks, shared by all memory contexts.
This cache a limited capacity (roughly 4MB per).

(4) Some of the comments are obsolete, still referencing malloc/free.


Benchmarks
--

I've done extensive testing and also benchmrking, and it seems to be no
slower than the current implementation, and in some cases is actually a
bit faster.

a) time pgbench -i -s 300

   - pgbench initialisation, measuring the COPY and the total duration.
   - averages of 3 runs (negligible variations between runs)

   COPY   total
-
master 0:26.22 1:22
mmap   0:26.35 1:22

   Pretty much no difference.

b) pgbench -S -c 8 -j 8 -T 60

   - short read-only runs (1 minute) after a warmup
   - min, max, average of 8 runs

   average  min  max
-
master   967859532997981
mmap 982799725999671

   That's a rather consistent 1-2% improvement.

c) REINDEX pgbench_accounts_pkey

   - large maintenance_work_mem so that it's in-memory sort
   - average, min, max of 8 runs (duration in seconds)

   average  min  max
-
master   10.359.64 13.56
mmap  9.859.81  9.90

Again, mostly improvement, except for the minimum where the currect
memory context was a bit faster. But overall the mmap-based one is
much more consistent.

Some of the differences may be due to allocating 4kB blocks from the
very start (while the current allocator starts with 1kB, then 2kB and
finally 4kB).

Ideas, opinions?


[1] http://linux.die.net/man/2/sbrk
[2] http://linux.die.net/man/2/mmap

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 0759e39..65e7f66 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -89,6 +89,8 @@
 #include utils/memdebug.h
 #include utils/memutils.h
 
+#include sys/mman.h
+
 /* Define this to detail debug alloc information */
 /* #define HAVE_ALLOCINFO */
 
@@ -117,13 +119,11 @@
  * wastage.)
  *
  */
-
+#define PAGE_SIZE			4096	/* getconf PAGE_SIZE */
 #define ALLOC_MINBITS		3	/* smallest chunk size is 8 bytes */
-#define ALLOCSET_NUM_FREELISTS	11
+#define ALLOCSET_NUM_FREELISTS	11	/* FIXME probably should be only 9 */
 #define ALLOC_CHUNK_LIMIT	(1  (ALLOCSET_NUM_FREELISTS-1+ALLOC_MINBITS))
-/* Size of largest chunk that we use a fixed size for */
-#define ALLOC_CHUNK_FRACTION	4
-/* We allow chunks to be at most 1/4 of maxBlockSize (less overhead) */
+/* Size of largest chunk that we use a fixed size for (2kB) */
 
 /*
  * The first block allocated for an allocset has size initBlockSize.
@@ -242,6 +242,31 @@ typedef struct AllocChunkData
 #define AllocChunkGetPointer(chk)	\
 	((AllocPointer)(((char *)(chk)) + 

Re: [HACKERS] New CF app deployment

2015-02-15 Thread Peter Geoghegan
On Sun, Feb 15, 2015 at 8:36 AM, Magnus Hagander mag...@hagander.net wrote:
 Hmm.  This kind of looks like the right idea, but it's hard to use,
 because all you've got to work with is the subject of the message
 (which is the same for all), the time it was sent, and the author.  In
 the old system, you could look at the message in the archives and then
 copy-and-paste in the message ID.  That's obviously a bit manual, but
 it worked.


 Hmm. Agreed, I think my test-thread was just too short to do that.

 I've added a box where you can copy/paste the messageid. Back to the manual,
 but it works.


Thank you for correcting the issue in a timely manner. I appreciate the effort.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPERIMENTAL: mmap-based memory context / allocator

2015-02-15 Thread Tomas Vondra
On 15.2.2015 20:56, Heikki Linnakangas wrote:
 On 02/15/2015 08:57 PM, Tomas Vondra wrote:
 One of the wilder ideas (I mentined beer was involved!) was a memory
 allocator based on mmap [2], bypassing the libc malloc implementation
 altogether. mmap() has some nice features (e.g. no issues with returning
 memory back to the kernel, which may be problem with sbrk). So I hacked
 a bit and switched the AllocSet implementation to mmap().
 
 glibc's malloc() also uses mmap() for larger allocations. Precisely
 because those allocations can then be handed back to the OS. I don't
 think we'd want to use mmap() for small allocations either. Let's not
 re-invent malloc()..

malloc() does that only for allocations over MAP_THRESHOLD, which is
128kB by default. Vast majority of blocks we allocate are = 8kB, so
mmap() almost never happens.

At least that's my understanding, I may be wrong of course.

 
 - Heikki
 



-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPERIMENTAL: mmap-based memory context / allocator

2015-02-15 Thread Andres Freund
On 2015-02-15 21:07:13 +0100, Tomas Vondra wrote:
 On 15.2.2015 20:56, Heikki Linnakangas wrote:
  On 02/15/2015 08:57 PM, Tomas Vondra wrote:
  One of the wilder ideas (I mentined beer was involved!) was a memory
  allocator based on mmap [2], bypassing the libc malloc implementation
  altogether. mmap() has some nice features (e.g. no issues with returning
  memory back to the kernel, which may be problem with sbrk). So I hacked
  a bit and switched the AllocSet implementation to mmap().
  
  glibc's malloc() also uses mmap() for larger allocations. Precisely
  because those allocations can then be handed back to the OS. I don't
  think we'd want to use mmap() for small allocations either. Let's not
  re-invent malloc()..
 
 malloc() does that only for allocations over MAP_THRESHOLD, which is
 128kB by default. Vast majority of blocks we allocate are = 8kB, so
 mmap() almost never happens.

The problem is that mmap() is, to my knowledge, noticeably more
expensive than sbrk(). Especially with concurrent workloads. Which is
why the malloc/libc authors chose to use sbrk...

IIRC glibc malloc also batches several allocation into mmap()ed areas
after some time.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPERIMENTAL: mmap-based memory context / allocator

2015-02-15 Thread Tomas Vondra
On 15.2.2015 21:13, Andres Freund wrote:
 On 2015-02-15 21:07:13 +0100, Tomas Vondra wrote:

 malloc() does that only for allocations over MAP_THRESHOLD, which
 is 128kB by default. Vast majority of blocks we allocate are =
 8kB, so mmap() almost never happens.
 
 The problem is that mmap() is, to my knowledge, noticeably more 
 expensive than sbrk(). Especially with concurrent workloads. Which is
 why the malloc/libc authors chose to use sbrk ...

Any ideas how to simulate such workloads? None of the tests I've done
suggested such issue exists.

 IIRC glibc malloc also batches several allocation into mmap()ed
 areas after some time.

Maybe, there's certainly a lot of such optimizations in libc. But how do
you return memory to system in that case?

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-15 Thread Tom Lane
Jim Nasby jim.na...@bluetreble.com writes:
 On 2/15/15 10:36 AM, Tom Lane wrote:
 I wonder if we couldn't achieve largely the same positive effects through
 adding a simple transaction-level timeout option.

 A common use-case is long-running reports hitting relatively stable data 
 in a database that also has tables with a high churn rate (ie: queue 
 tables). In those scenarios your only options right now are to suffer 
 huge amounts of bloat in the high-churn or not do your reporting. A 
 simple transaction timeout only solves this by denying you reporting 
 queries.

Agreed, but Kevin's proposal has exactly the same problem only worse,
because (a) the reporting transactions might or might not fail (per
Murphy, they'll fail consistently only when you're on deadline), and
(b) if they do fail, there's nothing you can do short of increasing the
slack db-wide.

 An idea that I've had on this would be some way to lock down the 
 tables that a long-running transaction could access. That would allow 
 vacuum to ignore any snapshots that transaction had for tables it wasn't 
 accessing. That's something that would be deterministic.

There might be something in that, but again it's not much like this patch.
The key point I think we're both making is that nondeterministic failures
are bad, especially when you're talking about long-running, expensive-to-
retry transactions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPERIMENTAL: mmap-based memory context / allocator

2015-02-15 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2015-02-15 21:07:13 +0100, Tomas Vondra wrote:
 On 15.2.2015 20:56, Heikki Linnakangas wrote:
 glibc's malloc() also uses mmap() for larger allocations. Precisely
 because those allocations can then be handed back to the OS. I don't
 think we'd want to use mmap() for small allocations either. Let's not
 re-invent malloc()..

 malloc() does that only for allocations over MAP_THRESHOLD, which is
 128kB by default. Vast majority of blocks we allocate are = 8kB, so
 mmap() almost never happens.

 The problem is that mmap() is, to my knowledge, noticeably more
 expensive than sbrk(). Especially with concurrent workloads. Which is
 why the malloc/libc authors chose to use sbrk...

 IIRC glibc malloc also batches several allocation into mmap()ed areas
 after some time.

Keep in mind also that aset.c doubles the request size every time it goes
back to malloc() for some more space for a given context.  So you get up
to 128kB pretty quickly.

There will be a population of 8K-to-64K chunks that don't ever get
returned to the OS but float back and forth between different
MemoryContexts as those are created and deleted.  I'm inclined to think
this is fine and we don't need to improve on it.

Part of the reason for my optimism is that on glibc-based platforms,
IME PG backends do pretty well at reducing their memory consumption back
down to a minimal value after each query.  (On other platforms, not so
much, but arguably that's libc's fault not ours.)  So I'm not really
seeing a problem that needs fixed, and definitely not one that a
platform-specific fix will do much for.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-15 Thread Jim Nasby

On 2/15/15 10:36 AM, Tom Lane wrote:

Kevin Grittner kgri...@ymail.com writes:

Tom Lane t...@sss.pgh.pa.us wrote:

Kevin Grittner kgri...@ymail.com writes:

What this patch does is add a GUC call old_snapshot_threshold.  It
defaults to -1, which leaves behavior matching unpatched code.
Above that it allows tuples to be vacuumed away after the number of
transaction IDs specified by the GUC have been consumed.



TBH, I'm not sure why we'd wish to emulate Oracle's single worst
operational feature.



I've run into cases where people have suffered horribly bloated
databases because of one ill-behaved connection.  Some companies
don't want to be vulnerable to that and the disruption that
recovery from that bloat causes.


No doubt, preventing bloat is a good thing, but that doesn't mean this
is the best API we could create for the issue.  The proposition this
patch offers to DBAs is: You can turn this knob to reduce bloat by some
hard-to-quantify factor.  The cost is that some long-running transactions
might fail.  You won't know which ones are at risk, the failures won't be
the same from time to time, and you won't be able to do anything to spare
high-value transactions from that fate except by turning that knob back
again globally (which requires a database restart).  Maybe refugees from
Oracle will think that sounds good, but nobody else will.

I wonder if we couldn't achieve largely the same positive effects through
adding a simple transaction-level timeout option.  That would be far
easier for users to understand and manage, it would be trivial to allow
specific high-value transactions to run with a laxer setting, it does not
create any implementation-detail-dependent behavior that we'd be having to
explain to users forevermore, and (not incidentally) it would be a lot
simpler and more trustworthy to implement.  There's no well-defined
correlation between your setting and the net effect on database bloat,
but that's true with the snapshot too old approach as well.


A common use-case is long-running reports hitting relatively stable data 
in a database that also has tables with a high churn rate (ie: queue 
tables). In those scenarios your only options right now are to suffer 
huge amounts of bloat in the high-churn or not do your reporting. A 
simple transaction timeout only solves this by denying you reporting 
queries.


An idea that I've had on this would be some way to lock down the 
tables that a long-running transaction could access. That would allow 
vacuum to ignore any snapshots that transaction had for tables it wasn't 
accessing. That's something that would be deterministic.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXPERIMENTAL: mmap-based memory context / allocator

2015-02-15 Thread Heikki Linnakangas

On 02/15/2015 08:57 PM, Tomas Vondra wrote:

One of the wilder ideas (I mentined beer was involved!) was a memory
allocator based on mmap [2], bypassing the libc malloc implementation
altogether. mmap() has some nice features (e.g. no issues with returning
memory back to the kernel, which may be problem with sbrk). So I hacked
a bit and switched the AllocSet implementation to mmap().


glibc's malloc() also uses mmap() for larger allocations. Precisely 
because those allocations can then be handed back to the OS. I don't 
think we'd want to use mmap() for small allocations either. Let's not 
re-invent malloc()..


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Really bad blowups with hash outer join and nulls

2015-02-15 Thread Andrew Gierth
This came up today on IRC, though I suspect the general problem has been
seen before:

create table m3 (id uuid, status integer);
create table q3 (id uuid);
insert into m3
  select uuid_generate_v4(), floor(random() * 4)::integer
from generate_series(1,100);
insert into q3
  select id
from (select id, row_number() over () as rnum from m3) s
   where (rnum % 4)=0;
insert into q3 select null from generate_series(1,25);

-- at this point m3 has a million unique rows, while q3 has
-- a quarter of the ids of m3, plus as many nulls

-- note that work_mem is at the default 4MB

explain analyze select status, count(*) from q3 left join m3 on (m3.id=q3.id) 
group by status;
  QUERY PLAN
  
--
 HashAggregate  (cost=51553.62..51553.67 rows=4 width=4) (actual 
time=259580.168..259580.179 rows=5 loops=1)
   Group Key: m3.status
   -  Hash Right Join  (cost=12927.31..49596.89 rows=391347 width=4) (actual 
time=18804.502..258498.897 rows=50 loops=1)
 Hash Cond: (m3.id = q3.id)
 -  Seq Scan on m3  (cost=0.00..16753.10 rows=1038310 width=20) 
(actual time=0.011..1618.762 rows=100 loops=1)
 -  Hash  (cost=6124.47..6124.47 rows=391347 width=16) (actual 
time=1742.340..1742.340 rows=50 loops=1)
   Buckets: 131072 (originally 131072)  Batches: 65536 (originally 
8)  Memory Usage: 8837kB
   -  Seq Scan on q3  (cost=0.00..6124.47 rows=391347 width=16) 
(actual time=0.033..774.732 rows=50 loops=1)
 Planning time: 0.178 ms
 Execution time: 259583.185 ms

The memory usage of this query runs to hundreds of megs despite the
small work_mem.

On 9.3.5, which the user was running, the problem was much more extreme
(this is with the default 1MB work_mem, and a data set only a quarter of
the size):

explain analyze select status, count(*) from q3 left join m3 on (m3.id=q3.id) 
group by status;
  QUERY PLAN
   
---
 HashAggregate  (cost=15020.11..15022.11 rows=200 width=4) (actual 
time=3733245.942..3733245.952 rows=5 loops=1)
   -  Hash Right Join  (cost=3976.50..14395.11 rows=125000 width=4) (actual 
time=227870.000..3732686.692 rows=125000 loops=1)
 Hash Cond: (m3.id = q3.id)
 -  Seq Scan on m3  (cost=0.00..4189.59 rows=259659 width=20) (actual 
time=0.024..807.162 rows=25 loops=1)
 -  Hash  (cost=1803.00..1803.00 rows=125000 width=16) (actual 
time=437.991..437.991 rows=125000 loops=1)
   Buckets: 4096  Batches: 262144 (originally 8)  Memory Usage: 
1954kB
   -  Seq Scan on q3  (cost=0.00..1803.00 rows=125000 width=16) 
(actual time=0.011..191.208 rows=125000 loops=1)
 Total runtime: 3733269.725 ms
(8 rows)

This query on 9.3.5 allocated over 2.7GB of RAM on my system.

Note the explosion in the number of batches. Tracing execution showed
that as the memory limit was reached while processing the large number
of nulls, the number of batches would be doubled, then only a tiny
number of rows could be written out as being no longer in the current
batch, so the limit was then quickly reached again.

A key part of the problem here may be the fact that nulls hash to
exactly 0, and are therefore always part of the initial in-memory batch
regardless of how much splitting happens. If a large subset of the data
had some other constant value, it would likely have some non-zero bits
in its hash value and therefore stand a good chance of being written out
to a batch file before things get too desperate, avoiding the mass
explosion.

A quick test suggests that initializing the hash value to ~0 rather than
0 has a curious effect: the number of batches still explodes, but the
performance does not suffer the same way. (I think because almost all
the batches end up empty.) I think this is worth doing even in the
absence of a more general solution; nulls are common enough and
important enough that they shouldn't be the worst-case value if it can
be avoided.

(Testing this idea revealed that a regression test for rowsecurity is
missing an order by clause and changes result based on hash values)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] assessing parallel-safety

2015-02-15 Thread Noah Misch
It's important for parallelism to work under the extended query protocol, and
that's nontrivial.  exec_parse_message() sets cursorOptions.
exec_bind_message() runs the planner.  We don't know if a parallel plan is
okay before seeing max_rows in exec_execute_message().

On Sun, Feb 15, 2015 at 01:29:06AM -0500, Robert Haas wrote:
 - For testing purposes, I set this up so that the executor activates
 parallel mode when running any query that seems to be parallel-safe.
 It seems nearly certain we'd only want to do this when a parallel plan
 was actually selected, but this behavior is awfully useful for
 testing, and revealed a number of bugs in the parallel-safety
 markings.

Why wouldn't that be a good permanent behavior?  The testing benefit extends
to users.  If I wrongly mark a function something other than
PROPARALLEL_UNSAFE, I'd rather find out ASAP.

 - The patch includes a script, fixpgproc.pl, which I used to insert
 and update the parallel markings on the system catalogs.  That is, of
 course, not intended for commit, but it's mighty useful for
 experimentation.

The CREATE OR REPLACE FUNCTION calls in src/backend/catalog/system_views.sql
reset proparallel=u for some built-in functions, such as make_interval.

Some categories of PROPARALLEL_SAFE functions deserve more attention:

- database_to_xml* need to match proparallel of schema_to_xml*, which are
  PROPARALLEL_UNSAFE due to the heavyweight lock rule.  cursor_to_xml*
  probably need to do likewise.

- Several functions, such as array_in and anytextcat, can call arbitrary I/O
  functions.  This implies a policy that I/O functions must be
  PROPARALLEL_SAFE.  I agree with that decision, but it should be documented.
  Several json-related functions can invoke X-json cast functions; the
  marking implies that all such cast functions must be PROPARALLEL_SAFE.
  That's probably okay, too.

- All selectivity estimator functions are PROPARALLEL_SAFE.  That implies, for
  example, that any operator using eqjoinsel ought to be PROPARALLEL_SAFE or
  define its own restriction selectivity estimator function.  On the other
  hand, the planner need not check the parallel safety of selectivity
  estimator functions.  If we're already in parallel mode at the moment we
  enter the planner, we must be planning a query called within a
  PROPARALLEL_SAFE function.  If the query uses an unsafe operator, the
  containing function was mismarked.

- inet_{client,server}_{addr,port} are PROPARALLEL_SAFE; will the information
  they need indeed be available in workers?

- Assuming you don't want to propagate XactLastRecEnd from the slave back to
  the master, restrict XLogInsert() during parallel mode.  Restrict functions
  that call it, including pg_create_restore_point, pg_switch_xlog and
  pg_stop_backup.

- pg_extension_config_dump modifies the database, so it shall be
  PROPARALLEL_UNSAFE.  Assuming pg_stat_clear_snapshot won't reach back to
  the master to clear the snapshot, it shall be PROPARALLEL_RESTRICTED.

If a mismarked function elicits ERROR:  cannot retain locks acquired while in
parallel mode, innocent queries later in the transaction get the same error:

begin;
select 1;   -- ok
savepoint q; select database_to_xml(true, true, 'x'); rollback to q; -- ERROR
select 1;   -- ERROR
rollback;

 --- a/src/backend/commands/typecmds.c
 +++ b/src/backend/commands/typecmds.c
 @@ -1610,6 +1610,7 @@ makeRangeConstructors(const char *name, Oid namespace,
 false,
 /* leakproof */
 false,
 /* isStrict */
 
 PROVOLATILE_IMMUTABLE,/* volatility */
 +   
 PROPARALLEL_UNSAFE,   /* parallel safety */

Range constructors are PROPARALLEL_SAFE.

 --- a/src/backend/executor/functions.c
 +++ b/src/backend/executor/functions.c
 @@ -496,7 +496,9 @@ init_execution_state(List *queryTree_list,
   if (queryTree-commandType == CMD_UTILITY)
   stmt = queryTree-utilityStmt;
   else
 - stmt = (Node *) pg_plan_query(queryTree, 0, 
 NULL);
 + stmt = (Node *) pg_plan_query(queryTree,
 + fcache-readonly_func ? 
 CURSOR_OPT_PARALLEL_OK : 0,
 + NULL);

(This code is deciding whether to allow parallelism in one statement of an
SQL-language function.)  Why does it care if the function is read-only?

I see that PL/pgSQL never allows parallelism in queries it launches.

 --- a/src/include/utils/lsyscache.h
 +++ b/src/include/utils/lsyscache.h
 @@ -79,6 +79,7 @@ extern bool op_mergejoinable(Oid opno, Oid inputtype);
  extern bool op_hashjoinable(Oid opno, 

[HACKERS] Issue installing doc tools on OSX

2015-02-15 Thread David Steele
I had a problem installing the doc tools following the directions for
OSX at http://www.postgresql.org/docs/9.4/static/docguide-toolsets.html.
 I'm running OSX Yosemite.

I got it to work by changing the command from:

sudo port install docbook-dsssl docbook-sgml-4.2 docbook-xml-4.2
docbook-xsl libxslt openjade opensp

To:

sudo port install docbook-dsssl docbook-sgml-4.2 docbook-xml-4.2
docbook-xsl libxslt opensp openjade

I didn't capture the error message unfortunately, but it was more or
less: unresolved dependency opensp while installing openjade.

Patch is attached.

-- 
- David Steele
da...@pgmasters.net
diff --git a/doc/src/sgml/docguide.sgml b/doc/src/sgml/docguide.sgml
index e0ae262..fa13404 100644
--- a/doc/src/sgml/docguide.sgml
+++ b/doc/src/sgml/docguide.sgml
@@ -280,7 +280,7 @@ apt-get install docbook docbook-dsssl docbook-xsl 
libxml2-utils openjade1.3 open
para
 If you use MacPorts, the following will get you set up:
 programlisting
-sudo port install docbook-dsssl docbook-sgml-4.2 docbook-xml-4.2 docbook-xsl 
libxslt openjade opensp
+sudo port install docbook-dsssl docbook-sgml-4.2 docbook-xml-4.2 docbook-xsl 
libxslt opensp openjade
 /programlisting
/para
   /sect2


signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] EXPERIMENTAL: mmap-based memory context / allocator

2015-02-15 Thread Tomas Vondra
On 15.2.2015 21:38, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
 On 2015-02-15 21:07:13 +0100, Tomas Vondra wrote:
 On 15.2.2015 20:56, Heikki Linnakangas wrote:
 glibc's malloc() also uses mmap() for larger allocations. Precisely
 because those allocations can then be handed back to the OS. I don't
 think we'd want to use mmap() for small allocations either. Let's not
 re-invent malloc()..
 
 malloc() does that only for allocations over MAP_THRESHOLD, which is
 128kB by default. Vast majority of blocks we allocate are = 8kB, so
 mmap() almost never happens.
 
 The problem is that mmap() is, to my knowledge, noticeably more
 expensive than sbrk(). Especially with concurrent workloads. Which is
 why the malloc/libc authors chose to use sbrk...
 
 IIRC glibc malloc also batches several allocation into mmap()ed
 areas after some time.
 
 Keep in mind also that aset.c doubles the request size every time it
 goes back to malloc() for some more space for a given context. So you
 get up to 128kB pretty quickly.

That's true, so for sufficiently large contexts we're already using
mmap() indirectly, through libc. Some contexts use just 8kB
(ALLOCSET_SMALL_MAXSIZE), but that's just a minority.

 There will be a population of 8K-to-64K chunks that don't ever get 
 returned to the OS but float back and forth between different 
 MemoryContexts as those are created and deleted. I'm inclined to
 think this is fine and we don't need to improve on it.

Sure, but there are scenarios where that can't happen, because the
contexts are created 'concurrently' so the blocks can't float between
the contexts.

And example that comes to mind is array_agg() with many groups, which is
made worse by allocating the MemoryContext data in TopMemoryContext,
creating 'islands' and making it impossible to release the memory.

http://www.postgresql.org/message-id/e010519fbe83b1331ee0dfcb122a6...@fuzzy.cz

 Part of the reason for my optimism is that on glibc-based platforms,
  IME PG backends do pretty well at reducing their memory consumption 
 back down to a minimal value after each query. (On other platforms, 
 not so much, but arguably that's libc's fault not ours.) So I'm not 
 really seeing a problem that needs fixed, and definitely not one
 that a platform-specific fix will do much for.

I certainly agree this is not something we need to fix ASAP, and that
bypassing the libc may not be the right remedy. That's why I posted it
just here (and not to the CF), and marked it as experimental.

That however does not mean we can't improve this somehow - from time to
time I have to deal with machines where the minimum amount of memory
assigned to a process grew over time, gradually increased memory
pressure and eventually causing trouble. There are ways to fix this
(e.g. by reopening the connections, thus creating a new backend).


-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gcc5: initdb produces gigabytes of _fsm files

2015-02-15 Thread Christoph Berg
Re: Tom Lane 2015-02-15 21030.1424022...@sss.pgh.pa.us
 Christoph Berg c...@df7cb.de writes:
  gcc5 is lurking in Debian experimental, and it's breaking initdb.
 
 FYI, this is now fixed in Red Hat's rawhide version:
 https://bugzilla.redhat.com/show_bug.cgi?id=1190978
 
 Don't know what the update process is like for Debian's copy, but
 maybe you could pester the appropriate people to absorb the referenced
 upstream fix quickly.

Thanks for pushing this towards the gcc people. I've updated the
Debian bugs so our gcc maintainers can upload a new version as well.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Selectivity estimation for inet operators

2015-02-15 Thread Emre Hasegeli
New version of the patch attached with the optimization to break the
loop before looking at all of the histogram values.  I can reduce
join selectivity estimation runtime by reducing the values of the
left hand side or both of the sides, if there is interest.

  Even if the above aspects of the code are really completely right, the
  comments fail to explain why.  I spent a lot of time on the comments,
  but so far as these points are concerned they still only explain what
  is being done and not why it's a useful calculation to make.

 I couldn't write better comments because I don't have strong arguments
 about it.  We can say that we don't try to make use of the both of
 the endpoints, because we don't know how to combine them.  We only use
 the one with matching family and masklen, and when both of them match
 we use the distant one to be on the safer side.

I added two more sentences to explain the calculation.
diff --git a/src/backend/utils/adt/network_selfuncs.c b/src/backend/utils/adt/network_selfuncs.c
index 73fc1ca..51a33c2 100644
--- a/src/backend/utils/adt/network_selfuncs.c
+++ b/src/backend/utils/adt/network_selfuncs.c
@@ -1,32 +1,972 @@
 /*-
  *
  * network_selfuncs.c
  *	  Functions for selectivity estimation of inet/cidr operators
  *
- * Currently these are just stubs, but we hope to do better soon.
+ * This module provides estimators for the subnet inclusion and overlap
+ * operators.  Estimates are based on null fraction, most common values,
+ * and histogram of inet/cidr columns.
  *
  * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
  *
  * IDENTIFICATION
  *	  src/backend/utils/adt/network_selfuncs.c
  *
  *-
  */
 #include postgres.h
 
+#include math.h
+
+#include access/htup_details.h
+#include catalog/pg_operator.h
+#include catalog/pg_statistic.h
 #include utils/inet.h
+#include utils/lsyscache.h
+#include utils/selfuncs.h
 
 
+/* Default selectivity for the inet overlap operator */
+#define DEFAULT_OVERLAP_SEL 0.01
+
+/* Default selectivity for the various inclusion operators */
+#define DEFAULT_INCLUSION_SEL 0.005
+
+/* Default selectivity for specified operator */
+#define DEFAULT_SEL(operator) \
+	((operator) == OID_INET_OVERLAP_OP ? \
+	 DEFAULT_OVERLAP_SEL : DEFAULT_INCLUSION_SEL)
+
+static Selectivity networkjoinsel_inner(Oid operator,
+	 VariableStatData *vardata1, VariableStatData *vardata2);
+static Selectivity networkjoinsel_semi(Oid operator,
+	VariableStatData *vardata1, VariableStatData *vardata2);
+static Selectivity mcv_population(float4 *mcv_numbers, int mcv_nvalues);
+static Selectivity inet_hist_value_sel(Datum *values, int nvalues,
+	Datum constvalue, int opr_codenum);
+static Selectivity inet_mcv_join_sel(Datum *mcv1_values,
+  float4 *mcv1_numbers, int mcv1_nvalues, Datum *mcv2_values,
+  float4 *mcv2_numbers, int mcv2_nvalues, Oid operator);
+static Selectivity inet_mcv_hist_sel(Datum *mcv_values, float4 *mcv_numbers,
+  int mcv_nvalues, Datum *hist_values, int hist_nvalues,
+  int opr_codenum);
+static Selectivity inet_hist_inclusion_join_sel(Datum *hist1_values,
+			 int hist1_nvalues,
+			 Datum *hist2_values, int hist2_nvalues,
+			 int opr_codenum);
+static Selectivity inet_semi_join_sel(Datum lhs_value,
+   bool mcv_exists, Datum *mcv_values, int mcv_nvalues,
+   bool hist_exists, Datum *hist_values, int hist_nvalues,
+   double hist_weight,
+   FmgrInfo *proc, int opr_codenum);
+static int	inet_opr_codenum(Oid operator);
+static int	inet_inclusion_cmp(inet *left, inet *right, int opr_codenum);
+static int inet_masklen_inclusion_cmp(inet *left, inet *right,
+		   int opr_codenum);
+static int inet_hist_match_divider(inet *boundary, inet *query,
+		int opr_codenum);
+
+/*
+ * Selectivity estimation for the subnet inclusion/overlap operators
+ */
 Datum
 networksel(PG_FUNCTION_ARGS)
 {
-	PG_RETURN_FLOAT8(0.001);
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	Oid			operator = PG_GETARG_OID(1);
+	List	   *args = (List *) PG_GETARG_POINTER(2);
+	int			varRelid = PG_GETARG_INT32(3);
+	VariableStatData vardata;
+	Node	   *other;
+	bool		varonleft;
+	Selectivity selec,
+mcv_selec,
+non_mcv_selec;
+	Datum		constvalue,
+			   *hist_values;
+	int			hist_nvalues;
+	Form_pg_statistic stats;
+	double		sumcommon,
+nullfrac;
+	FmgrInfo	proc;
+
+	/*
+	 * If expression is not (variable op something) or (something op
+	 * variable), then punt and return a default estimate.
+	 */
+	if (!get_restriction_variable(root, args, varRelid,
+  vardata, other, varonleft))
+		PG_RETURN_FLOAT8(DEFAULT_SEL(operator));
+
+	/*
+	 * Can't do anything useful if the something is not a constant, either.
+	 */
+	if (!IsA(other, Const))
+	{
+		

Re: [HACKERS] mogrify and indent features for jsonb

2015-02-15 Thread Andrew Dunstan


On 02/14/2015 10:06 PM, Andrew Dunstan wrote:


Attached is a patch to provide a number of very useful facilities to 
jsonb that people have asked for. These are based on work by Dmitry 
Dolgov in his jsonbx extension, but I take responsibility for any bugs.


The facilities are:

new operations:

concatenation:jsonb || jsonb - jsonb
deletion: jsonb - text - jsonb
deletion: jsonb - int - text

new functions:

produce indented text: jsonb_indent(jsonb) - text
change an element at a path:  jsonb_replace(jsonb, text[], jsonb) - 
jsonb.



It would be relatively trivial to add:

delete an element at a path: jsonb_delete(jsonb, text[]) - json

and I think we should do that for the sake of completeness.

The docs might need a little extra work, and the indent code 
definitely needs work, which I hope to complete in the next day or 
two, but I wanted to put a stake in the ground.





In this version the indent code now works correctly, and there is an 
additional delete operator:


jsonb - text[] - jsonb

Which deletes data at the designated path.

cheers

andrew

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index d57243a..9936bff 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10256,6 +10256,30 @@ table2-mapping
 entryDo all of these key/element emphasisstrings/emphasis exist?/entry
 entryliteral'[a, b]'::jsonb ?amp; array['a', 'b']/literal/entry
/row
+   row
+entryliteral||/literal/entry
+entrytypejsonb/type/entry
+entryConcatentate these two values to make a new value/entry
+entryliteral'[a, b]'::jsonb || '[c, d]'::jsonb/literal/entry
+   /row
+   row
+entryliteral-/literal/entry
+entrytypetext/type/entry
+entryDelete the field with this key, or element with this value/entry
+entryliteral'{a: b}'::jsonb - 'a' /literal/entry
+   /row
+   row
+entryliteral-/literal/entry
+entrytypeinteger/type/entry
+entryDelete the field or element with this index (Negative integers count from the end)/entry
+entryliteral'[a, b]'::jsonb - 1 /literal/entry
+   /row
+   row
+entryliteral-/literal/entry
+entrytypetext[]/type/entry
+entryDelete the field or element with this path/entry
+entryliteral'[a, {b:1}]'::jsonb - '{1,b}'::text[] /literal/entry
+   /row
   /tbody
  /tgroup
/table
@@ -10760,6 +10784,42 @@ table2-mapping
entryliteraljson_strip_nulls('[{f1:1,f2:null},2,null,3]')/literal/entry
entryliteral[{f1:1},2,null,3]/literal/entry
/row
+  row
+   entryparaliteraljsonb_replace(target jsonb, path text[], replacement jsonb)/literal
+ /para/entry
+   entryparatypejsonb/type/para/entry
+   entry
+ Returns replaceabletarget/replaceable
+ with the section designated by  replaceablepath/replaceable
+ replaced by replaceablereplacement/replaceable.
+   /entry
+   entryliteraljsonb_replace('[{f1:1,f2:null},2,null,3]', '{0,f1},'[2,3,4]')/literal/entry
+   entryliteral[{f1:[2,3,4],f2:null},2,null,3]/literal
+/entry
+   /row
+  row
+   entryparaliteraljsonb_indent(from_json jsonb)/literal
+ /para/entry
+   entryparatypetext/type/para/entry
+   entry
+ Returns replaceablefrom_json/replaceable
+ as indented json text.
+   /entry
+   entryliteraljsonb_indent('[{f1:1,f2:null},2,null,3]')/literal/entry
+   entry
+programlisting
+ [  
+ {  
+ f1: 1,   
+ f2: null 
+ }, 
+ 2, 
+ null,  
+ 3  
+ ]
+/programlisting
+/entry
+   /row
  /tbody
 /tgroup
/table
diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 644ea6d..133b9ba 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -77,6 +77,8 @@ static void datum_to_jsonb(Datum val, bool is_null, JsonbInState *result,
 static void add_jsonb(Datum val, bool is_null, JsonbInState *result,
 		  Oid val_type, bool key_scalar);
 static JsonbParseState * clone_parse_state(JsonbParseState * state);
+static char *JsonbToCStringWorker(StringInfo out, JsonbContainer *in, int estimated_len, bool indent);
+static void add_indent(StringInfo out, bool indent, int level);
 
 /*
  * jsonb type input function
@@ -414,12 +416,40 @@ jsonb_in_scalar(void *pstate, char *token, JsonTokenType tokentype)
 char *
 JsonbToCString(StringInfo out, JsonbContainer *in, int estimated_len)
 {
+	return JsonbToCStringWorker(out, in, estimated_len, false);
+}
+
+/*
+ * same thing but with indentation turned on
+ */
+
+char *
+JsonbToCStringIndent(StringInfo out, JsonbContainer *in, int estimated_len)
+{
+	return JsonbToCStringWorker(out, in, estimated_len, true);
+}
+
+
+/*
+ * common worker for above two functions
+ */
+static 

[HACKERS] Rationalizing the API for array_ref, array_set, and friends

2015-02-15 Thread Tom Lane
The four functions array_ref, array_set, array_get_slice, array_set_slice
have traditionally declared their array inputs and results as being of
type ArrayType *.  This is a lie, and has been since Berkeley days,
because they actually also support fixed-length array types such as
name and point.  If anyone tried to reference such arguments/results
as an ArrayType struct they'd be in for a surprise.  We made it worse
when we invented toasting, because the arguments (though not the results,
at the moment) might also be toast pointers or short-header varlenas.

Aside from being confusing, ISTM this notational abuse runs some risk of
bad code generation, because a compiler might think the declaration of
these arguments entitles it to assume the pointers are 4-byte-aligned.
I'm not aware that we've had any actual bug reports like that, but as
the compiler boys get ever tenser, who knows what will happen?

Anyway, I'd been quietly averting my eyes from this issue for some time,
but preserving this mess is helping to make a mess of my expanded-array
patch, so I think it's time to do something about it.  Does anyone have
an objection to my pushing the attached cleanup patch?  It simply fixes
the function declarations and adds/subtracts DatumGet/GetDatum calls
as necessary.

regards, tom lane

diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 3b95552..44d3b3e 100644
*** a/src/backend/commands/extension.c
--- b/src/backend/commands/extension.c
*** pg_extension_config_dump(PG_FUNCTION_ARG
*** 2158,2170 
  			}
  		}
  
! 		a = array_set(a, 1, arrayIndex,
! 	  elementDatum,
! 	  false,
! 	  -1 /* varlena array */ ,
! 	  sizeof(Oid) /* OID's typlen */ ,
! 	  true /* OID's typbyval */ ,
! 	  'i' /* OID's typalign */ );
  	}
  	repl_val[Anum_pg_extension_extconfig - 1] = PointerGetDatum(a);
  	repl_repl[Anum_pg_extension_extconfig - 1] = true;
--- 2158,2170 
  			}
  		}
  
! 		a = DatumGetArrayTypeP(array_set(PointerGetDatum(a), 1, arrayIndex,
! 		 elementDatum,
! 		 false,
! 		 -1 /* varlena array */ ,
! 		 sizeof(Oid) /* OID's typlen */ ,
! 		 true /* OID's typbyval */ ,
! 		 'i' /* OID's typalign */ ));
  	}
  	repl_val[Anum_pg_extension_extconfig - 1] = PointerGetDatum(a);
  	repl_repl[Anum_pg_extension_extconfig - 1] = true;
*** pg_extension_config_dump(PG_FUNCTION_ARG
*** 2196,2208 
  			elog(ERROR, extconfig and extcondition arrays do not match);
  
  		/* Add or replace at same index as in extconfig */
! 		a = array_set(a, 1, arrayIndex,
! 	  elementDatum,
! 	  false,
! 	  -1 /* varlena array */ ,
! 	  -1 /* TEXT's typlen */ ,
! 	  false /* TEXT's typbyval */ ,
! 	  'i' /* TEXT's typalign */ );
  	}
  	repl_val[Anum_pg_extension_extcondition - 1] = PointerGetDatum(a);
  	repl_repl[Anum_pg_extension_extcondition - 1] = true;
--- 2196,2208 
  			elog(ERROR, extconfig and extcondition arrays do not match);
  
  		/* Add or replace at same index as in extconfig */
! 		a = DatumGetArrayTypeP(array_set(PointerGetDatum(a), 1, arrayIndex,
! 		 elementDatum,
! 		 false,
! 		 -1 /* varlena array */ ,
! 		 -1 /* TEXT's typlen */ ,
! 		 false /* TEXT's typbyval */ ,
! 		 'i' /* TEXT's typalign */ ));
  	}
  	repl_val[Anum_pg_extension_extcondition - 1] = PointerGetDatum(a);
  	repl_repl[Anum_pg_extension_extcondition - 1] = true;
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 0e7400f..e66d7b5 100644
*** a/src/backend/executor/execQual.c
--- b/src/backend/executor/execQual.c
*** static Datum ExecEvalCurrentOfExpr(ExprS
*** 252,263 
   *
   * NOTE: if we get a NULL result from a subscript expression, we return NULL
   * when it's an array reference, or raise an error when it's an assignment.
-  *
-  * NOTE: we deliberately refrain from applying DatumGetArrayTypeP() here,
-  * even though that might seem natural, because this code needs to support
-  * both varlena arrays and fixed-length array types.  DatumGetArrayTypeP()
-  * only works for the varlena kind.  The routines we call in arrayfuncs.c
-  * have to know the difference (that's what they need refattrlength for).
   *--
   */
  static Datum
--- 252,257 
*** ExecEvalArrayRef(ArrayRefExprState *asta
*** 267,274 
   ExprDoneCond *isDone)
  {
  	ArrayRef   *arrayRef = (ArrayRef *) astate-xprstate.expr;
! 	ArrayType  *array_source;
! 	ArrayType  *resultArray;
  	bool		isAssignment = (arrayRef-refassgnexpr != NULL);
  	bool		eisnull;
  	ListCell   *l;
--- 261,267 
   ExprDoneCond *isDone)
  {
  	ArrayRef   *arrayRef = (ArrayRef *) astate-xprstate.expr;
! 	Datum		array_source;
  	bool		isAssignment = (arrayRef-refassgnexpr != NULL);
  	bool		eisnull;
  	ListCell   *l;
*** ExecEvalArrayRef(ArrayRefExprState *asta
*** 278,288 
 

Re: [HACKERS] Issue installing doc tools on OSX

2015-02-15 Thread David Steele
On 2/15/15 7:50 PM, Peter Eisentraut wrote:
 On 2/15/15 6:31 PM, David Steele wrote:
 I had a problem installing the doc tools following the directions for
 OSX at http://www.postgresql.org/docs/9.4/static/docguide-toolsets.html.
  I'm running OSX Yosemite.

 I got it to work by changing the command from:

 sudo port install docbook-dsssl docbook-sgml-4.2 docbook-xml-4.2
 docbook-xsl libxslt openjade opensp

 To:

 sudo port install docbook-dsssl docbook-sgml-4.2 docbook-xml-4.2
 docbook-xsl libxslt opensp openjade

 I didn't capture the error message unfortunately, but it was more or
 less: unresolved dependency opensp while installing openjade.
 
 That seems a bit incredible, since port should be able to resolve the
 dependencies by itself.  I suggest that this should be reported as a bug
 to MacPorts.

Sure, that has been my experience, but the error message was very clear.
 Unfortunately I did not capture the error before I changed the order
and the log file was removed on the next run.

Since I have no easy way to reproduce it I'm not sure it's worth
submitting as a bug.  However, I thought a reorder of the packages in
our docs couldn't hurt, and might help.

-- 
- David Steele
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-15 Thread Peter Geoghegan
On Fri, Feb 13, 2015 at 7:22 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 In patch 1, sepgsql is also affected by this commit.  Note that this commit
 necessitates an initdb, since stored ACLs are broken.

 Doesn't that warrant bumping catversion?

Yes, but traditionally that is left until the last minute - when the
patch is committed. That's why it's called out in the commit message
(it isn't otherwise obvious - it's not a common catversion
necessitating change like an addition to a system catalog).

 Patch 2
 + * When killing a speculatively-inserted tuple, we set xmin to invalid
 and
 +if (!(xlrec-flags  XLOG_HEAP_KILLED_SPECULATIVE_TUPLE))

 When doing this, should we also set the HEAP_XMIN_INVALID hint bit?

 reads more of patch

 Ok, I see we're not doing this because the check for a super deleted tuple
 is already cheap. Probably worth mentioning that in the comment...

See my remarks to Heikki on this. I don't think it adds much.

 ExecInsert():
 + * We don't suppress the effects (or, perhaps, side-effects) of
 + * BEFORE ROW INSERT triggers.  This isn't ideal, but then we
 + * cannot proceed with even considering uniqueness violations until
 + * these triggers fire on the one hand, but on the other hand they
 + * have the ability to execute arbitrary user-defined code which
 + * may perform operations entirely outside the system's ability to
 + * nullify.

 I'm a bit confused as to why we're calling BEFORE triggers out here...
 hasn't this always been true for both BEFORE *and* AFTER triggers? The
 comment makes me wonder if there's some magic that happens with AFTER...

Yes, but the difference here is that the UPDATE path could be taken
(which is sort of like when a before row insert path returns NULL).
What I'm calling out is the dependency on firing before row insert
triggers to decide if the alternative path must be taken. Roughly
speaking, we must perform part of the INSERT (firing of before row
insert triggers) in order to decide to do or not do an INSERT. This
is, as I said, similar to when those triggers return NULL, and won't
matter with idiomatic patterns of before trigger usage. Still feels
worth calling out, because sometimes users do foolishly write before
triggers with many external side-effects.

 ExecLockUpdateTuple():
 + * Try to lock tuple for update as part of speculative insertion.  If
 + * a qual originating from ON CONFLICT UPDATE is satisfied, update
 + * (but still lock row, even though it may not satisfy estate's
 + * snapshot).

 I find this confusing... which row is being locked? The previously inserted
 one? Perhaps a better wording would be satisfied, update. Lock the original
 row even if it doesn't satisfy estate's snapshot.

Take a look at the executor README. We're locking the only row that
can be locked when an UPSERT non-conclusively thinks to take the
UPDATE path: the row that was found during our pre-check. We can only
UPDATE when we find such a tuple, and then lock it without finding a
row-level conflict.

 infer_unique_index():
 +/*
 + * We need not lock the relation since it was already locked, either by
 + * the rewriter or when expand_inherited_rtentry() added it to the query's
 + * rangetable.
 + */
 +relationObjectId = rt_fetch(parse-resultRelation, parse-rtable)-relid;
 +
 +relation = heap_open(relationObjectId, NoLock);

 Seems like there should be an Assert here. Also, the comment should probably
 go before the heap_open call.

An Assert() of what? Note that the similar function
get_relation_info() does about the same thing here.

 heapam_xlog.h:
 +/* reuse xl_heap_multi_insert-only bit for xl_heap_delete */
 I wish this would mention why it's safe to do this. Also, the comment
 mentions xl_heap_delete when the #define is for
 XLOG_HEAP_KILLED_SPECULATIVE_TUPLE; presumably that's wrong. Perhaps:
 /*
  * reuse XLOG_HEAP_LAST_MULTI_INSERT bit for
  * XLOG_HEAP_KILLED_SPECULATIVE_TUPLE. This is safe because we never do
  * multi-inserts for INSERT ON CONFLICT.
  */

It's safe, as the comment indicates, because the former is only used
for xl_heap_multi_insert records, while the latter is only used for
xl_heap_delete records. There is no ambiguity, because naturally we're
always able to establish what type of record is under consideration
before checking the bit is set.

The XLOG_HEAP_* format is used for other flags there, despite the fact
that other flags (like XLOG_HEAP_PREFIX_FROM_OLD) can only appear in
certain context-appropriate xl_heap_* records. AFAICT, all that I've
done that's new here is rely on that, since a bunch of those bits were
used up in the last year or two, and the need to even consider bit
reuse here is a new problem.

 I'll review the remaining patches later.

Thanks.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Really bad blowups with hash outer join and nulls

2015-02-15 Thread Andrew Gierth
 Tomas == Tomas Vondra tomas.von...@2ndquadrant.com writes:

 Tomas Improving the estimates is always good, but it's not going to
 Tomas fix the case of non-NULL values (it shouldn't be all that
 Tomas difficult to create such examples with a value whose hash starts
 Tomas with a bunch of zeroes).

Right now, I can't get it to plan such an example, because (a) if there
are no stats to work from then the planner makes fairly pessimistic
assumptions about hash bucket filling, and (b) if there _are_ stats to
work from, then a frequently-occurring non-null value shows up as an MCV
and the planner takes that into account to calculate bucketsize.

The problem could only be demonstrated for NULLs because the planner was
ignoring NULL for the purposes of estimating bucketsize, which is
correct for all join types except RIGHT and FULL (which, iirc, are more
recent additions to the hashjoin repertoire).

If you want to try testing it, you may find this useful:

select i, hashint8(i) from unnest(array[1474049294, -1779024306, -1329041947]) 
u(i);
  i  | hashint8 
-+--
  1474049294 |0
 -1779024306 |0
 -1329041947 |0
(3 rows)

(those are the only three int4 values that hash to exactly 0)

It's probably possible to construct pathological cases by finding a lot
of different values with zeros in the high bits of the hash, but that's
something that wouldn't be likely to happen by chance.

 Tomas I think this might be solved by relaxing the check a bit.

Yeah, that looks potentially useful.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New CF app deployment

2015-02-15 Thread Peter Geoghegan
On Sun, Feb 15, 2015 at 4:59 PM, Peter Eisentraut pete...@gmx.net wrote:
 I think the old system where the patch submitter declared, this message
 contains my patch, is the only one that will work.

I tend to agree. That being said, calling out latest attachments is
also useful (or highlighting that a particular mail has a particular
file attached in general).

Annotations-wise, I think that it would be nice to be able to modify
an annotation, in case a typo is made (the old app supported this). I
also think that it's a waste of screen space to show who within the
annotation view. Granted, the old app supported this, but I tend to
think that if I actually cared who added a certain annotation, I'd be
happy to drill down into history. I haven't cared so far, AFAICR.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2015-02-15 Thread Etsuro Fujita
On 2014/09/13 0:13, Tom Lane wrote:
 Albe Laurenz laurenz.a...@wien.gv.at writes:
 Tom Lane wrote:
 I'm not sure offhand what the new plan tree ought to look like.  We could
 just generate a ForeignScan node, but that seems like rather a misnomer.
 Is it worth inventing a new ForeignUpdate node type?  Or maybe it'd still
 be ForeignScan but with a flag field saying hey this is really an update
 (or a delete).  The main benefit I can think of right now is that the
 EXPLAIN output would be less strange-looking --- but EXPLAIN is hardly
 the only thing that ever looks at plan trees, so having an outright
 misleading plan structure is likely to burn us down the line.
 
 I can understand these qualms.
 I wonder if ForeignUpdate is such a good name though, since it would
 surprise the uninitiate that in the regular (no push-down) case the
 actual modification is *not* performed by ForeignUpdate.
 So it should rather be a ForeignModifyingScan, but I personally would
 prefer a has_side_effects flag on ForeignScan.
 
 I was envisioning that the EXPLAIN output would look like
 
  Foreign Scan on tab1
Remote SQL: SELECT ...
 
 for the normal case, versus
 
  Foreign Update on tab1
Remote SQL: UPDATE ...
 
 for the pushed-down-update case (and similarly for DELETE).  For a
 non-optimized update it'd still be a ForeignScan underneath a ModifyTable.
 
 As for the internal representation, I was thinking of adding a CmdType
 field to struct ForeignScan, with currently only CMD_SELECT, CMD_UPDATE,
 CMD_DELETE as allowed values, though possibly in future we'd think of a
 reason to allow CMD_INSERT there.  This is more or less isomorphic to your
 has_side_effects flag, but it allows distinguishing UPDATE from DELETE
 which might be useful.
 
 The only thing that's bothering me about this concept is that I'm not
 seeing how to scale it up to handling a pushed-down update on a join,
 ie, UPDATE foo ... FROM bar ... where both foo and bar are remote.
 Maybe it's silly to worry about that until join push-down is done;
 but in that case I'd vote for postponing this whole patch until we
 have join push-down.

I'll re-add this to the final CF.  And I'll update the patch.

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical Replication Helpers WIP for discussion

2015-02-15 Thread Michael Paquier
On Sun, Feb 15, 2015 at 11:25 PM, Petr Jelinek p...@2ndquadrant.com wrote:

 On 13/02/15 14:04, Petr Jelinek wrote:

 On 13/02/15 08:48, Michael Paquier wrote:


 Looking at this patch, I don't see what we actually gain much here
 except a decoder plugin that speaks a special protocol for a special
 background worker that has not been presented yet. What actually is the
 value of that defined as a contrib/ module in-core. Note that we have
 already test_decoding to basically test the logical decoding facility,
 used at least at the SQL level to get logical changes decoded.

 Based on those reasons I am planning to mark this as rejected (it has no
 documentation as well). So please speak up if you think the contrary,
 but it seems to me that this could live happily out of core.


 I think you are missing point of this, it's not meant to be committed in
 this form at all and even less as contrib module. It was meant as basis
 for in-core logical replication discussion, but sadly I didn't really
 have time to pursue it in this CF in the end.


 That being said and looking at the size of February CF, I think I am fine
 with dropping this in 9.5 cycle, it does not seem likely that there will be
 anything useful done with this fast enough to get to 9.5 so there is no
 point in spending committer resources on it in final CF.

 I will pick it up again after the CF is done.


OK, thanks for the clarifications. Note that I am marking it as rejected
in CF 2014-12 not because it is something that is not wanted, but just not
to re-add it to CF 2015-02 which is what returned with feedback actually
does...
-- 
Michael


Re: [HACKERS] Commit fest 2015-12 enters money time

2015-02-15 Thread Michael Paquier
On Fri, Feb 13, 2015 at 10:06 AM, Michael Paquier michael.paqu...@gmail.com
 wrote:

 In order to move on to the next CF, I am going to go through all the
 remaining patches and update their status accordingly. And sorry for
 slacking a bit regarding that. If you wish to complain, of course feel
 free to post messages on this thread or on the thread related to the
 patch itself.


As all the entries have been either moved or updated, CF 2014-12 is closed,
and CF 2015-02 has been changed to In Progress.

Regards,
-- 
Michael


Re: [HACKERS] Join push-down support for foreign tables

2015-02-15 Thread Kouhei Kaigai
Hanada-san,

Your patch mixtures enhancement of custom-/foreign-scan interface and
enhancement of contrib/postgres_fdw... Probably, it is a careless mis-
operation.
Please make your patch as differences from my infrastructure portion.


Also, I noticed this Join pushdown support for foreign tables patch
is unintentionally rejected in the last commit fest.
  https://commitfest.postgresql.org/3/20/
I couldn't register myself as reviewer. How do I operate it on the
new commitfest application?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Shigeru Hanada
 Sent: Monday, February 16, 2015 1:03 PM
 To: Robert Haas
 Cc: PostgreSQL-development
 Subject: Re: [HACKERS] Join push-down support for foreign tables
 
 Hi
 
 I've revised the patch based on Kaigai-san's custom/foreign join patch
 posted in the thread below.
 
 http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F80
 108c...@bpxm15gp.gisp.nec.co.jp
 
 Basically not changed from the version in the last CF, but as Robert
 commented before, N-way (not only 2-way) joins should be supported in the
 first version by construct SELECT SQL by containing source query in FROM
 clause as inline views (a.k.a. from clause subquery).
 
 2014-12-26 13:48 GMT+09:00 Shigeru Hanada shigeru.han...@gmail.com:
  2014-12-16 1:22 GMT+09:00 Robert Haas robertmh...@gmail.com:
  On Mon, Dec 15, 2014 at 3:40 AM, Shigeru Hanada
  shigeru.han...@gmail.com wrote:
  I'm working on $SUBJECT and would like to get comments about the
  design.  Attached patch is for the design below.
 
  I'm glad you are working on this.
 
  1. Join source relations
  As described above, postgres_fdw (and most of SQL-based FDWs) needs
  to check that 1) all foreign tables in the join belong to a server,
  and
  2) all foreign tables have same checkAsUser.
  In addition to that, I add extra limitation that both inner/outer
  should be plain foreign tables, not a result of foreign join.  This
  limiation makes SQL generator simple.  Fundamentally it's possible
  to join even join relations, so N-way join is listed as enhancement
  item below.
 
  It seems pretty important to me that we have a way to push the entire
  join nest down.  Being able to push down a 2-way join but not more
  seems like quite a severe limitation.
 
  Hmm, I agree to support N-way join is very useful.  Postgres-XC's SQL
  generator seems to give us a hint for such case, I'll check it out
  again.
 
  --
  Shigeru HANADA
 
 
 
 --
 Shigeru HANADA

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-15 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Jim Nasby jim.na...@bluetreble.com writes:
 On 2/15/15 10:36 AM, Tom Lane wrote:
 I wonder if we couldn't achieve largely the same positive effects through
 adding a simple transaction-level timeout option.

We suggested this to our customer and got out of the meeting with
it looking like it *might* fly.  In the next meeting, however, they
said they had run it by others and reviewed the code and it was
completely unacceptable -- they would not consider pg with this as
the solution.

 A common use-case is long-running reports hitting relatively stable data
 in a database that also has tables with a high churn rate (ie: queue
 tables). In those scenarios your only options right now are to suffer
 huge amounts of bloat in the high-churn or not do your reporting. A
 simple transaction timeout only solves this by denying you reporting
 queries.

 Agreed, but Kevin's proposal has exactly the same problem only worse,
 because (a) the reporting transactions might or might not fail (per
 Murphy, they'll fail consistently only when you're on deadline), and
 (b) if they do fail, there's nothing you can do short of increasing the
 slack db-wide.

These they were comfortable with, and did *not* consider to be
unpredictable or something they could not do something about.
I really don't feel I can say more than that, though, without
disclosing more than I should.

 An idea that I've had on this would be some way to lock down the
 tables that a long-running transaction could access. That would allow
 vacuum to ignore any snapshots that transaction had for tables it wasn't
 accessing. That's something that would be deterministic.

While this option was not specifically suggested, based on their
their reasons that numerous other options we suggested were
unacceptable, I feel sure that this would not be acceptable.

I think Tom hit the nail on the head when he said Maybe refugees
from Oracle will think that sounds good...  It is precisely those
with very large code bases which have been modified over long
periods of time to work with Oracle that would find this solution
attractive, or perhaps *necessary* to consider a move to Postgres.
That's a potential market we don't care to write off.

 There might be something in that, but again it's not much like this patch.
 The key point I think we're both making is that nondeterministic failures
 are bad, especially when you're talking about long-running, expensive-to-
 retry transactions.

What the customer most doesn't want to be nondeterministic is
whether the error is generated only when the snapshot has been used
to read a page which has been modified since the snapshot was
taken.  If tables or materialized views are set up before a query
and then not subsequently modified during the execution of the
query, that query must not be canceled even if it runs for days,
but it must not cause unconstrained bloat during the run.  So far I
don't see any way to avoid including the LSN with the snapshot or
modifying the index AMs.  Let's be clear on the footprint for that;
for the btree implementation it is:

 src/backend/access/nbtree/nbtinsert.c |  7 ---
 src/backend/access/nbtree/nbtpage.c   |  2 +-
 src/backend/access/nbtree/nbtsearch.c | 43 
++-
 src/include/access/nbtree.h   |  7 ---
 4 files changed, 43 insertions(+), 16 deletions(-)

What this discussion has made me reconsider is the metric for
considering a transaction too old.  The number of transaction IDs
consumed seems inferior as the unit of measure for that to LSN or
time.

It looks to me to be pretty trivial (on the order of maybe 30 lines
of code) to specify this GUC in minutes rather than transaction
IDs.  At first glance this seems like it would be vulnerable to the
usual complaints about mismanaged clocks, but those are easily
answered by using a cached time in shared memory that we populate
in such a way that it can never move backward.  Done right, this
could even allow the GUC to be changeable on reload rather than
only at restart.  A badly mismanaged system clock could not cause a
query to generate incorrect results; the worst that could happen is
that this feature would fail to control bloat as much as expected
or reads of modified data could generate the snapshot too old
error around the time of the clock adjustment.

As before, this would default to a magic value to mean that you
want the historical PostgreSQL behavior.

If that makes the idea more palatable to anyone, I can submit a
patch to that effect within the next 24 hours.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump gets attributes from tables in extensions

2015-02-15 Thread Michael Paquier
On Mon, Feb 16, 2015 at 4:45 PM, Michael Paquier
michael.paqu...@gmail.comwrote:

 but not dumping any tables that are part of ext_member.


Excuse the typo = s/but/by/.
-- 
Michael


Re: [HACKERS] Join push-down support for foreign tables

2015-02-15 Thread Shigeru Hanada
Hi

I've revised the patch based on Kaigai-san's custom/foreign join patch
posted in the thread below.

http://www.postgresql.org/message-id/9a28c8860f777e439aa12e8aea7694f80108c...@bpxm15gp.gisp.nec.co.jp

Basically not changed from the version in the last CF, but as Robert
commented before, N-way (not only 2-way) joins should be supported in
the first version by construct SELECT SQL by containing source query
in FROM clause as inline views (a.k.a. from clause subquery).

2014-12-26 13:48 GMT+09:00 Shigeru Hanada shigeru.han...@gmail.com:
 2014-12-16 1:22 GMT+09:00 Robert Haas robertmh...@gmail.com:
 On Mon, Dec 15, 2014 at 3:40 AM, Shigeru Hanada
 shigeru.han...@gmail.com wrote:
 I'm working on $SUBJECT and would like to get comments about the
 design.  Attached patch is for the design below.

 I'm glad you are working on this.

 1. Join source relations
 As described above, postgres_fdw (and most of SQL-based FDWs) needs to
 check that 1) all foreign tables in the join belong to a server, and
 2) all foreign tables have same checkAsUser.
 In addition to that, I add extra limitation that both inner/outer
 should be plain foreign tables, not a result of foreign join.  This
 limiation makes SQL generator simple.  Fundamentally it's possible to
 join even join relations, so N-way join is listed as enhancement item
 below.

 It seems pretty important to me that we have a way to push the entire
 join nest down.  Being able to push down a 2-way join but not more
 seems like quite a severe limitation.

 Hmm, I agree to support N-way join is very useful.  Postgres-XC's SQL
 generator seems to give us a hint for such case, I'll check it out
 again.

 --
 Shigeru HANADA



-- 
Shigeru HANADA


foreign_join.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Join push-down support for foreign tables

2015-02-15 Thread Shigeru Hanada
Kaigai-san,

Oops.  I rebased the patch onto your v4 custom/foreign join patch.
But as you mentioned off-list, I found a flaw about inappropriate
change about NestPath still remains in the patch...  I might have made
my dev branch into unexpected state.  I'll check it soon.

2015-02-16 13:13 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com:
 Hanada-san,

 Your patch mixtures enhancement of custom-/foreign-scan interface and
 enhancement of contrib/postgres_fdw... Probably, it is a careless mis-
 operation.
 Please make your patch as differences from my infrastructure portion.


 Also, I noticed this Join pushdown support for foreign tables patch
 is unintentionally rejected in the last commit fest.
   https://commitfest.postgresql.org/3/20/
 I couldn't register myself as reviewer. How do I operate it on the
 new commitfest application?

 Thanks,
 --
 NEC OSS Promotion Center / PG-Strom Project
 KaiGai Kohei kai...@ak.jp.nec.com


 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Shigeru Hanada
 Sent: Monday, February 16, 2015 1:03 PM
 To: Robert Haas
 Cc: PostgreSQL-development
 Subject: Re: [HACKERS] Join push-down support for foreign tables

 Hi

 I've revised the patch based on Kaigai-san's custom/foreign join patch
 posted in the thread below.

 http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F80
 108c...@bpxm15gp.gisp.nec.co.jp

 Basically not changed from the version in the last CF, but as Robert
 commented before, N-way (not only 2-way) joins should be supported in the
 first version by construct SELECT SQL by containing source query in FROM
 clause as inline views (a.k.a. from clause subquery).

 2014-12-26 13:48 GMT+09:00 Shigeru Hanada shigeru.han...@gmail.com:
  2014-12-16 1:22 GMT+09:00 Robert Haas robertmh...@gmail.com:
  On Mon, Dec 15, 2014 at 3:40 AM, Shigeru Hanada
  shigeru.han...@gmail.com wrote:
  I'm working on $SUBJECT and would like to get comments about the
  design.  Attached patch is for the design below.
 
  I'm glad you are working on this.
 
  1. Join source relations
  As described above, postgres_fdw (and most of SQL-based FDWs) needs
  to check that 1) all foreign tables in the join belong to a server,
  and
  2) all foreign tables have same checkAsUser.
  In addition to that, I add extra limitation that both inner/outer
  should be plain foreign tables, not a result of foreign join.  This
  limiation makes SQL generator simple.  Fundamentally it's possible
  to join even join relations, so N-way join is listed as enhancement
  item below.
 
  It seems pretty important to me that we have a way to push the entire
  join nest down.  Being able to push down a 2-way join but not more
  seems like quite a severe limitation.
 
  Hmm, I agree to support N-way join is very useful.  Postgres-XC's SQL
  generator seems to give us a hint for such case, I'll check it out
  again.
 
  --
  Shigeru HANADA



 --
 Shigeru HANADA



-- 
Shigeru HANADA


foreign_join.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_dump gets attributes from tables in extensions

2015-02-15 Thread Michael Paquier
Hi all,

While looking at the patch to fix pg_dump with extensions containing tables
referencing each other, I got surprised by the fact that getTableAttrs
tries to dump table attributes even for tables that are part of an
extension. Is that normal?
Attached is a patch that I think makes things right, but not dumping any
tables that are part of ext_member.

Thoughts?

Regards,
-- 
Michael
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 7e92b74..2c9356b 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -6657,6 +6657,10 @@ getTableAttrs(Archive *fout, DumpOptions *dopt, TableInfo *tblinfo, int numTable
 		if (!tbinfo-interesting)
 			continue;
 
+		/* Ignore tables part of an extension */
+		if (tbinfo-dobj.ext_member)
+			continue;
+
 		/*
 		 * Make sure we are in proper schema for this table; this allows
 		 * correct retrieval of formatted type names and default exprs

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers