Re: [HACKERS] multixacts woes

2015-05-10 Thread Noah Misch
On Sun, May 10, 2015 at 09:17:58PM -0400, Robert Haas wrote:
 On Sun, May 10, 2015 at 1:40 PM, Noah Misch n...@leadboat.com wrote:
  I don't know whether this deserves prompt remediation, but if it does, I 
  would
  look no further than the hard-coded 25% figure.  We permit users to operate
  close to XID wraparound design limits.  GUC maximums force an 
  anti-wraparound
  vacuum at no later than 93.1% of design capacity.  XID assignment warns at
  99.5%, then stops at 99.95%.  PostgreSQL mandates a larger cushion for
  pg_multixact/offsets, with anti-wraparound VACUUM by 46.6% and a stop at
  50.0%.  Commit 53bb309d2d5a9432d2602c93ed18e58bd2924e15 introduced the
  bulkiest mandatory cushion yet, an anti-wraparound vacuum when
  pg_multixact/members is just 25% full.
 
 That's certainly one possible approach.  I had discounted it because
 you can't really get more than a small multiple out of it, but getting
 2-3x more room might indeed be enough to help some people quite a bit.
 Just raising the threshold from 25% to say 40% would buy back a
 healthy amount.

Right.  It's fair to assume that the new VACUUM burden would be discountable
at a 90+% threshold, because the installations that could possibly find it
expensive are precisely those experiencing corruption today.  These reports
took eighteen months to appear, whereas some corruption originating in commit
0ac5ad5 saw reports within three months.  Therefore, sites burning
pg_multixact/members proportionally faster than both pg_multixact/offsets and
XIDs must be unusual.  Bottom line: if we do need to reduce VACUUM burden
caused by the commits you cited upthread, we almost certainly don't need more
than a 4x improvement.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] default value dosen't get applyed in this situation

2015-05-10 Thread Aliouii Ali

this a test case :


CREATE TABLE tab
(
  _id bigserial NOT NULL,
  _name text,
  CONSTRAINT tab_pkey PRIMARY KEY (_id)
);
CREATE TABLE tab_s1
(
CONSTRAINT tab_s1_check CHECK (1 = 1)
)
INHERITS (tab);
CREATE OR REPLACE VIEW v_tab AS 
 SELECT tab._id,
tab._name
   FROM tab;
CREATE OR REPLACE FUNCTION tab_insert()
  
RETURNS trigger AS
 $BODY$
BEGIN
INSERT INTO tab_s1 VALUES ((NEW).*);
RETURN NEW;
END $BODY$
  
LANGUAGE plpgsql;
CREATE TRIGGER tab_trigger
INSTEAD OF INSERT ON v_tab
FOR EACH ROW EXECUTE PROCEDURE tab_insert();

-- the query fail because _id is null
insert into v_tab(_name) values ('');


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Tom Lane
Kohei KaiGai kai...@kaigai.gr.jp writes:
 I briefly checked your updates.
 Even though it is not described in the commit-log, I noticed a
 problematic change.

 This commit reverts create_plan_recurse() as static function.

Yes.  I am not convinced that external callers should be calling that,
and would prefer not to enlarge createplan.c's API footprint without a
demonstration that this is right and useful.  (This is one of many
ways in which this patch is suffering from having gotten committed
without submitted use-cases.)

 It means extension
 cannot have child node, even if it wants to add a custom-join logic.
 Please assume a simple case below:
   SELECT * FROM t0, t1 WHERE t0.a = t1.x;

 An extension adds a custom join path, then its PlanCustomPath method will be
 called back to create a plan node once it gets chosen by planner.
 The role of PlanCustomPath is to construct a plan-node of itself, and 
 plan-nodes
 of the source relations also.
 If create_plan_recurse() is static, we have no way to initialize
 plan-node for t0
 and t1 scan even if join-logic itself is powerful than built-in ones.

I find this argument quite unconvincing, because even granting that this
is an appropriate way to create child nodes of a CustomScan, there is a
lot of core code besides createplan.c that would not know about those
child nodes either.

As a counterexample, suppose that your cool-new-join-method is capable of
joining three inputs at once.  You could stick two of the children into
lefttree and righttree perhaps, but where are you gonna put the other?

This comes back to the fact that trying to wedge join behavior into scan
node types was a pretty bad idea (as evidenced by the entirely bogus
decision that now scanrelid can be zero, which I rather doubt you've found
all the places that that broke).  Really there should have been a new
CustomJoin node or something like that.  If we'd done that, it would be
possible to design that node type more like Append, with any number of
child nodes.  And we could have set things up so that createplan.c knows
about those child nodes and takes care of the recursion for you; it would
still not be a good idea to expose create_plan_recurse and hope that
callers of that would know how to use it correctly.

I am still more than half tempted to say we should revert this entire
patch series and hope for a better design to be submitted for 9.6.
In the meantime, though, poking random holes in the modularity of core
code is a poor substitute for having designed a well-thought-out API.

A possible compromise that we could perhaps still wedge into 9.5 is to
extend CustomPath with a List of child Paths, and CustomScan with a List
of child Plans, which createplan.c would know to build from the Paths,
and other modules would then also be aware of these children.  I find that
uglier than a separate join node type, but it would be tolerable I guess.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Andres Freund
On 2015-05-10 22:51:33 -0400, Robert Haas wrote:
  And there's definitely some things
  around that pretty much only still exist because changing them would
  break too much stuff.
 
 Such as what?

Without even thinking about it:
* linitial vs lfirst vs lnext. That thing still induces an impedance
  mismatch when reading code for me, and I believe a good number of
  other people.
* Two 'string buffer' APIs with essentially only minor differences.
* A whole bunch of libpq APIs. Admittedly that's a bit more exposed than
  lots of backend only things.
* The whole V0 calling convention that makes it so much easier to get
  odd crashes.

Admittedly that's all I could come up without having to think. But I do
vaguely remember a lot of things we did not do because of bwcompat
concerns.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Kouhei Kaigai
 On 2015-05-10 21:26:26 -0400, Robert Haas wrote:
  On Sun, May 10, 2015 at 8:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
This commit reverts create_plan_recurse() as static function.
   Yes.  I am not convinced that external callers should be calling that,
   and would prefer not to enlarge createplan.c's API footprint without a
   demonstration that this is right and useful.  (This is one of many
   ways in which this patch is suffering from having gotten committed
   without submitted use-cases.)
 
 Wasn't there a submitted use case? IIRC Kaigai had referenced some
 pg-strom (?) code using it?
 
 I'm failing to see how create_plan_recurse() being exposed externally is
 related to having gotten committed without submitted use-cases.  Even
 if submitted, presumably as simple as possible code, doesn't use it,
 that's not a proof that less simple code does not need it.

Yes, PG-Strom code uses create_plan_recurse() to construct child plan
node of the GPU accelerated custom-join logic, once it got chosen.
Here is nothing special. It calls create_plan_recurse() as built-in
join path doing on the underlying inner/outer paths. 
It is not difficult to submit as a working example, however, its total
code size (excludes GPU code) is 25KL at this moment.

I'm not certain whether it is a simple example.

  Your unwillingness to make functions global or to stick PGDLLIMPORT
  markings on variables that people want access to is hugely
  handicapping extension authors.  Many people have complained about
  that on multiple occasions.  Frankly, I find it obstructionist and
  petty.
 
 While I don't find the tone of the characterization super helpful, I do
 tend to agree that we're *far* too conservative on that end.  I've now
 seen a significant number of extension that copied large swathes of code
 just to cope with individual functions not being available.  And even
 cases where that lead to minor forks with such details changed.

I may have to join the members?

 I know that I'm fearful of asking for functions being made
 public. Because it'll invariably get into a discussion of merits that's
 completely out of proportion with the size of the change.  And if I, who
 has been on the list for a while now, am afraid in that way, you can
 be sure that others won't even dare to ask, lest argue their way
 through.
 
 I think the problem is that during development the default often is to
 create function as static if they're used only in one file. Which is
 fine. But it really doesn't work if it's a larger battle to change
 single incidences.  Besides the pain of having to wait for the next
 major release...
 
Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Kohei KaiGai
Tom,

I briefly checked your updates.
Even though it is not described in the commit-log, I noticed a
problematic change.

This commit reverts create_plan_recurse() as static function. It means extension
cannot have child node, even if it wants to add a custom-join logic.
Please assume a simple case below:
  SELECT * FROM t0, t1 WHERE t0.a = t1.x;

An extension adds a custom join path, then its PlanCustomPath method will be
called back to create a plan node once it gets chosen by planner.
The role of PlanCustomPath is to construct a plan-node of itself, and plan-nodes
of the source relations also.
If create_plan_recurse() is static, we have no way to initialize
plan-node for t0
and t1 scan even if join-logic itself is powerful than built-in ones.

It was already discussed in the upthread, and people's consensus.
Please revert create_plan_recurse() as like initial commit.

Also, regarding of the *_scan_tlist,

 I don't have time to pursue this idea right now, but I think it would be
 a good change to squeeze into 9.5, just so that we could have some test
 coverage on those parts of this patch.

Do you want just a test purpose module and regression test cases?

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] default value dosen't get applyed in this situation

2015-05-10 Thread Tom Lane
Aliouii Ali aliouii@aol.fr writes:
 this a test case :

This is not supposed to insert a default.  Attach a default expression
to the view column, if you want inserts on the view to have defaults.
ALTER VIEW ... ALTER COLUMN ... SET DEFAULT is the way.

(Note that in recent versions of PG, that view would be auto-insertable
so you would not need to bother with the trigger.  But the situation
with defaults hasn't changed, and won't because it would be a backwards
compatibility break with no real value-add.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

2015-05-10 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 I'm not sure what exactly to use as a performance benchmark
 here. For now I chose
 SELECT * FROM (SELECT ARRAY(SELECT generate_series(1, 1))) d, 
 generate_series(1, 1000) repeat(i);
 that'll hit array_out, which uses iterators.

Hmm, probably those results are swamped by I/O functions though.
I'd suggest trying something that exercises array_map(), which
it looks like means doing an array coercion.  Perhaps like so:

do $$
declare a int4[];
x int;
begin
  a := array(select generate_series(1,1000));
  for i in 1..10 loop
x := array_length(a::int8[], 1);
  end loop;
end$$;

Anyway, thanks for poking at it!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multixacts woes

2015-05-10 Thread Robert Haas
On Sun, May 10, 2015 at 1:40 PM, Noah Misch n...@leadboat.com wrote:
 I don't know whether this deserves prompt remediation, but if it does, I would
 look no further than the hard-coded 25% figure.  We permit users to operate
 close to XID wraparound design limits.  GUC maximums force an anti-wraparound
 vacuum at no later than 93.1% of design capacity.  XID assignment warns at
 99.5%, then stops at 99.95%.  PostgreSQL mandates a larger cushion for
 pg_multixact/offsets, with anti-wraparound VACUUM by 46.6% and a stop at
 50.0%.  Commit 53bb309d2d5a9432d2602c93ed18e58bd2924e15 introduced the
 bulkiest mandatory cushion yet, an anti-wraparound vacuum when
 pg_multixact/members is just 25% full.

That's certainly one possible approach.  I had discounted it because
you can't really get more than a small multiple out of it, but getting
2-3x more room might indeed be enough to help some people quite a bit.
Just raising the threshold from 25% to say 40% would buy back a
healthy amount.

Or, as you suggest, we could just add a GUC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Robert Haas
On Sun, May 10, 2015 at 9:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Your unwillingness to make functions global or to stick PGDLLIMPORT
 markings on variables that people want access to is hugely
 handicapping extension authors.  Many people have complained about
 that on multiple occasions.  Frankly, I find it obstructionist and
 petty.

 Sure, we could export every last static function in the core code,
 and extension authors would rejoice ... while development on the core
 code basically stops for fear of breaking extensions.  It's important
 not to export things that we don't have to, especially when doing so
 is really just a quick-n-dirty substitute for doing things properly.

Please name EVEN ONE instance in which core development has been
prevented for fear of changing a function API.  Sure, we take those
things into consideration, like trying to ensure that there will be
type conflicts until people update their code, but I cannot recall a
single instance in six and a half years of working on this project
where that's been a real problem.  I think this concern is entirely
hypothetical.  Besides, no one has ever proposed making every static
function public.  It's been proposed a handful of times for limited
classes of functions - in this case ONE - and you've fought it every
time despite clear consensus on the other side.  I find that highly
regrettable and I'm very sure I'm not the only one.

I notice that you carefully didn't answer the other part of my
question: what gives you the right to revert my commits without
discussion or consensus, and do I have an equal right to change it
back?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] unaddressable bytes in BRIN

2015-05-10 Thread Alvaro Herrera
Andres Freund just forwarded me a valgrind error report that Peter
Geoghegan noticed:

==29892== Unaddressable byte(s) found during client check request
==29892==at 0x7D1317: PageAddItem (bufpage.c:314)
==29892==by 0x468106: brin_doinsert (brin_pageops.c:315)
==29892==by 0x4671A5: form_and_insert_tuple (brin.c:1178)
==29892==by 0x466006: brinbuildCallback (brin.c:596)
==29892==by 0x53F6E4: IndexBuildHeapRangeScan (index.c:2548)
==29892==by 0x53EC19: IndexBuildHeapScan (index.c:2161)
==29892==by 0x466443: brinbuild (brin.c:694)
==29892==by 0x92F09F: OidFunctionCall3Coll (fmgr.c:1649)
==29892==by 0x53E924: index_build (index.c:2024)
==29892==by 0x53D5FC: index_create (index.c:1099)
==29892==by 0x60B3B7: DefineIndex (indexcmds.c:605)
==29892==by 0x7E2142: ProcessUtilitySlow (utility.c:1203)
==29892==  Address 0xccffd86 is 5,270 bytes inside a block of size 8,192 alloc'd
==29892==at 0x4C2AB80: malloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==29892==by 0x950425: AllocSetAlloc (aset.c:847)
==29892==by 0x9530A5: palloc (mcxt.c:821)
==29892==by 0x4C6A32: heap_tuple_untoast_attr (tuptoaster.c:215)
==29892==by 0x9302F2: pg_detoast_datum (fmgr.c:2238)
==29892==by 0x8794C8: numeric_lt (numeric.c:2060)
==29892==by 0x92E211: FunctionCall2Coll (fmgr.c:1323)
==29892==by 0x46C441: brin_minmax_add_value (brin_minmax.c:113)
==29892==by 0x92E49C: FunctionCall4Coll (fmgr.c:1375)
==29892==by 0x466108: brinbuildCallback (brin.c:618)
==29892==by 0x53F6E4: IndexBuildHeapRangeScan (index.c:2548)
==29892==by 0x53EC19: IndexBuildHeapScan (index.c:2161)

What I think this means is that during an index build
brin_minmax_add_value() called numeric_lt() which detoasted one of its
input values; later, brin_doinsert() inserts a tuple containing the
value, but it tries to use more bytes than were allocated.  I haven't
had time to actually study what is going on here, but wanted to archive
this publicly.  (Value detoasting evidently plays a role here, but I
don't know how.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-05-10 Thread Tom Lane
Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 [ EvalPlanQual-v6.patch ]

I've started to study this in a little more detail, and I'm not terribly
happy with some of the API decisions in it.

In particular, I find the addition of void *fdw_state to ExecRowMark
to be pretty questionable.  That does not seem like a good place to keep
random state.  (I realize that WHERE CURRENT OF keeps some state in
ExecRowMark, but that's a crock not something to emulate.)  ISTM that in
most scenarios, the state that LockForeignRow/FetchForeignRow would like
to get at is probably the FDW state associated with the ForeignScanState
that the tuple came from.  Which this API doesn't help with particularly.
I wonder if we should instead add a ScanState* field and expect the
core code to set that up (ExecOpenScanRelation could do it with minor
API changes...).

I'm also a bit tempted to pass the TIDs to LockForeignRow and
FetchForeignRow as Datums not ItemPointers.  We have the Datum format
available already at the call sites, so this is free as far as the core
code is concerned, and would only cost another line or so for the FDWs.
This is by no means sufficient to allow FDWs to use some other type than
tid for row identifiers; but it would be a down payment on that problem,
and at least would avoid nailing the rowids-are-tids assumption into yet
another global API.

Thoughts?

Also, as I mentioned, I'd be a whole lot happier if we had a way to test
this...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Robert Haas
On Sun, May 10, 2015 at 8:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Kohei KaiGai kai...@kaigai.gr.jp writes:
 I briefly checked your updates.
 Even though it is not described in the commit-log, I noticed a
 problematic change.

 This commit reverts create_plan_recurse() as static function.

 Yes.  I am not convinced that external callers should be calling that,
 and would prefer not to enlarge createplan.c's API footprint without a
 demonstration that this is right and useful.  (This is one of many
 ways in which this patch is suffering from having gotten committed
 without submitted use-cases.)

I really think that reverting somebody else's committed change without
discussion is inappropriate.  If I don't like the fact that you
reverted this change, can I go revert it back?

Your unwillingness to make functions global or to stick PGDLLIMPORT
markings on variables that people want access to is hugely
handicapping extension authors.  Many people have complained about
that on multiple occasions.  Frankly, I find it obstructionist and
petty.

If you want to improve the design of this so that it does the same
things more elegantly, fine: I'll get out of the way.  If you just
want to make things impossible that the patch previously made
possible, I strongly object to that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Async execution of postgres_fdw.

2015-05-10 Thread Kyotaro HORIGUCHI
Hello.

 I've gone ahead and marked this as Rejected.  The concept of async
 execution of postgres_fdw is certainly still open and a worthwhile goal,
 but this implementation isn't the way to achieve that.

It sounds fair. I'm satisfied that we have agreed that the goal
is worthwhile. I'll show other implementations sooner.

Thank you.

 * Tom Lane (t...@sss.pgh.pa.us) wrote:
  Stephen Frost sfr...@snowman.net writes:
   I'm all for improving performance of postgres_fdw and would like to see
   us support sending queries off to be worked asyncronously, but starting
   execution on the remote server during ExecInitNode is against the
   documentated FDW API spec.  I discussed exactly this issue over a year
   ago here:
  
   http://www.postgresql.org/message-id/20131104032604.gb2...@tamriel.snowman.net
  
   Sadly, there weren't any direct responses to that email, but I do recall
   having a discussion on another thread (or in person?) with Tom where we
   ended up agreeing that we can't simply remove that requirement from the
   docs or the API.
  
  Yeah.  There are at least a couple of reasons why not:
 
 Thanks for the reminders of those.
 
  Also, so far as a quick review of the actual patch goes, I would really
  like to see this lose the PFC wrapper layer, which accounts for 95% of
  the code churn in the patch and doesn't seem to add any actual value.
  What it does add is unchecked malloc failure conditions.
 
 Agreed, the wrapper isn't doing anything particularly useful; I had
 started out thinking that would be my first comment until it became
 clear where all the performance improvement was coming from.
 
 I've gone ahead and marked this as Rejected.  The concept of async
 execution of postgres_fdw is certainly still open and a worthwhile goal,
 but this implementation isn't the way to achieve that.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Andres Freund
On 2015-05-10 21:53:45 -0400, Robert Haas wrote:
 Please name EVEN ONE instance in which core development has been
 prevented for fear of changing a function API.

Even *moving* function declarations to a different file has been laudly
and repeatedly complained about... And there's definitely some things
around that pretty much only still exist because changing them would
break too much stuff.

But.

I don't think that's a reason to not expose more functions
externally. Because the usual consequence of not exposing them is that
either ugly workarounds will be found, or code will just copy pasted
around. That's not in any way better, and much likely to be worse.

I'm not saying that we shouldn't use judgement, but I do think that the
current approach ridicules our vaunted extensibility in many cases.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Robert Haas
On Sun, May 10, 2015 at 10:37 PM, Andres Freund and...@anarazel.de wrote:
 On 2015-05-10 21:53:45 -0400, Robert Haas wrote:
 Please name EVEN ONE instance in which core development has been
 prevented for fear of changing a function API.

 Even *moving* function declarations to a different file has been laudly
 and repeatedly complained about...

Moving declarations is a lot more likely to break compiles than adding
declarations.  But even the 9.3 header file reorganizations, which
broke enough compiles to be annoying, were only annoying, not a
serious problem for anyone.  I doubted whether that stuff was worth
changing, but that's just because I don't really get excited about
partial recompiles.

 And there's definitely some things
 around that pretty much only still exist because changing them would
 break too much stuff.

Such as what?

 But.

 I don't think that's a reason to not expose more functions
 externally. Because the usual consequence of not exposing them is that
 either ugly workarounds will be found, or code will just copy pasted
 around. That's not in any way better, and much likely to be worse.

Yes.

 I'm not saying that we shouldn't use judgement, but I do think that the
 current approach ridicules our vaunted extensibility in many cases.

Double yes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Seq Scan

2015-05-10 Thread Robert Haas
On Thu, May 7, 2015 at 3:23 AM, Amit Kapila amit.kapil...@gmail.com wrote:
  I observed one issue while working on this review comment.  When we
  try to destroy the parallel setup via ExecEndNode (as due to Limit
  Node, it could not destroy after consuming all tuples), it waits for
  parallel
  workers to finish (WaitForParallelWorkersToFinish()) and parallel
  workers
  are waiting for master backend to signal them as their queue is full.
  I think in such a case master backend needs to inform workers either
  when
  the scan is discontinued due to limit node or while waiting for parallel
  workers to finish.

 Isn't this why TupleQueueFunnelShutdown() calls shm_mq_detach()?
 That's supposed to unstick the workers; any impending or future writes
 will just return SHM_MQ_DETACHED without waiting.

 Okay, that can work if we call it in ExecEndNode() before
 WaitForParallelWorkersToFinish(), however what if we want to do something
 like TupleQueueFunnelShutdown() when Limit node decides to stop processing
 the outer node. We can traverse the whole plan tree and find the nodes
 where
 parallel workers needs to be stopped, but I don't think thats good way to
 handle
 it.  If we don't want to stop workers from processing until
 ExecutorEnd()---ExecEndNode(), then it will lead to workers continuing till
 that time and it won't be easy to get instrumentation/buffer usage
 information
 from workers (workers fill such information for master backend after
 execution
 is complete) as that is done before ExecutorEnd().  For Explain Analyze ..,
 we
 can ensure that workers are stopped before fetching that information from
 Funnel node, but the same is not easy for buffer usage stats required by
 plugins as that operates at ExecutorRun() and ExecutorFinish() level where
 we don't have direct access to node level information.  You can refer
 pgss_ExecutorEnd() where it completes the storage of stats information
 before calling ExecutorEnd().  Offhand, I could not think of a good way to
 do this, but one crude way could be introduce a new API
 (ParallelExecutorEnd())
 for such plugins which needs to be called before completing the stats
 accumulation.
 This API will call ExecEndPlan() if parallelmodeNeeded flag is set and allow
 accumulation of stats (InstrStartNode()/InstrStopNode())

OK, so if I understand you here, the problem is what to do about an
orphaned worker.  The Limit node just stops fetching from the lower
nodes, and those nodes don't get any clue that this has happened, so
their workers just sit there until the end of the query.  Of course,
that happens already, but it doesn't usually hurt very much, because
the Limit node usually appears at or near the top of the plan.

It could matter, though.  Suppose the Limit is for a subquery that has
a Sort somewhere (not immediately) beneath it.  My guess is the Sort's
tuplestore will stick around until after the subquery finishes
executing for as long as the top-level query is executing, which in
theory could be a huge waste of resources.  In practice, I guess
people don't really write queries that way.  If they did, I think we'd
have already developed some general method for fixing this sort of
problem.

I think it might be better to try to solve this problem in a more
localized way.  Can we arrange for planstate-instrumentation to point
directory into the DSM, instead of copying the data over later?  That
seems like it might help, or perhaps there's another approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

2015-05-10 Thread Andres Freund
On 2015-05-10 12:09:41 -0400, Tom Lane wrote:
  * I find the ARRAY_ITER_VARS/ARRAY_ITER_NEXT macros rather ugly. I don't
buy the argument that turning them into functions will be slower. I'd
bet the contrary on common platforms.

 Perhaps; do you want to do some testing and see?

I've added new iterator functions using a on-stack state variable and
array_iter_setup/next functions pretty analogous to the macros. And then
converted arrayfuncs.c to use them.

Codesize before introducing inline functions:

assert:
   textdata bss dec hex filename
8142400   50562   295952 8488914  8187d2 src/backend/postgres
optimize:
   textdata bss dec hex filename
6892928   50022   295920 7238870  6e74d6 src/backend/postgres

After:

assert:
   textdata bss dec hex filename
8133040   50562   295952 8479554  816342 src/backend/postgres
optimize:
   textdata bss dec hex filename
6890256   50022   295920 7236198  6e6a66 src/backend/postgres

That's a small decrease.

I'm not sure what exactly to use as a performance benchmark
here. For now I chose
SELECT * FROM (SELECT ARRAY(SELECT generate_series(1, 1))) d, 
generate_series(1, 1000) repeat(i);
that'll hit array_out, which uses iterators.

pgbench -P 10 -h /tmp -p 5440 postgres -n -f /tmp/bench.sql -c 4 -T 60
(I chose parallel because it'll show icache efficiency differences)

before, best of four:
tps = 4.921260 (including connections establishing)

after, best of four:
tps = 5.046437 (including connections establishing)

That's a relatively small difference. I'm not surprised, I'd not have
expected anything major.

Personally I think something roughly along those lines is both more
robust and easier to maintain. Even if possibly need to protect against
inlines not being available.


Similarly using inline funcs for AARR_NDIMS/HASNULL does not appear to
hamper performance and gets rid of the multiple evaluation risk.

These patches are obviously WIP. Especially with the iter stuff it's
possible that the concept could be extended a bit further.

Greetings,

Andres Freund
From 1fe208cda8df5341794ef6b76e11578ecd46cdd8 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Mon, 11 May 2015 02:43:06 +0200
Subject: [PATCH 1/2] WIP: Use inline functions for iteration.

---
 src/backend/utils/adt/arrayfuncs.c | 68 +
 src/include/c.h|  2 +
 src/include/utils/array.h  | 78 +-
 3 files changed, 115 insertions(+), 33 deletions(-)

diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 26fa648..287aca9 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -1037,7 +1037,7 @@ array_out(PG_FUNCTION_ARGS)
 	int			ndim,
 			   *dims,
 			   *lb;
-	ARRAY_ITER	ARRAY_ITER_VARS(iter);
+	array_iter	iter;
 	ArrayMetaState *my_extra;
 
 	/*
@@ -1105,7 +1105,7 @@ array_out(PG_FUNCTION_ARGS)
 	needquotes = (bool *) palloc(nitems * sizeof(bool));
 	overall_length = 1;			/* don't forget to count \0 at end. */
 
-	ARRAY_ITER_SETUP(iter, v);
+	array_iter_setup(iter, v);
 
 	for (i = 0; i  nitems; i++)
 	{
@@ -1114,7 +1114,8 @@ array_out(PG_FUNCTION_ARGS)
 		bool		needquote;
 
 		/* Get source element, checking for NULL */
-		ARRAY_ITER_NEXT(iter, i, itemvalue, isnull, typlen, typbyval, typalign);
+		itemvalue = array_iter_next(iter, i, isnull,
+	typlen, typbyval, typalign);
 
 		if (isnull)
 		{
@@ -1542,7 +1543,7 @@ array_send(PG_FUNCTION_ARGS)
 			   *dim,
 			   *lb;
 	StringInfoData buf;
-	ARRAY_ITER	ARRAY_ITER_VARS(iter);
+	array_iter	iter;
 	ArrayMetaState *my_extra;
 
 	/*
@@ -1597,7 +1598,7 @@ array_send(PG_FUNCTION_ARGS)
 	}
 
 	/* Send the array elements using the element's own sendproc */
-	ARRAY_ITER_SETUP(iter, v);
+	array_iter_setup(iter, v);
 
 	for (i = 0; i  nitems; i++)
 	{
@@ -1605,7 +1606,8 @@ array_send(PG_FUNCTION_ARGS)
 		bool		isnull;
 
 		/* Get source element, checking for NULL */
-		ARRAY_ITER_NEXT(iter, i, itemvalue, isnull, typlen, typbyval, typalign);
+		itemvalue = array_iter_next(iter, i, isnull,
+	typlen, typbyval, typalign);
 
 		if (isnull)
 		{
@@ -3105,7 +3107,7 @@ array_map(FunctionCallInfo fcinfo, Oid retType, ArrayMapState *amstate)
 	int			typlen;
 	bool		typbyval;
 	char		typalign;
-	ARRAY_ITER	ARRAY_ITER_VARS(iter);
+	array_iter	iter;
 	ArrayMetaState *inp_extra;
 	ArrayMetaState *ret_extra;
 
@@ -3165,7 +3167,7 @@ array_map(FunctionCallInfo fcinfo, Oid retType, ArrayMapState *amstate)
 	nulls = (bool *) palloc(nitems * sizeof(bool));
 
 	/* Loop over source data */
-	ARRAY_ITER_SETUP(iter, v);
+	array_iter_setup(iter, v);
 	hasnulls = false;
 
 	for (i = 0; i  nitems; i++)
@@ -3173,8 +3175,8 @@ array_map(FunctionCallInfo fcinfo, Oid retType, ArrayMapState *amstate)
 		bool		callit = true;
 
 		/* Get source element, checking for NULL */
-		ARRAY_ITER_NEXT(iter, i, 

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Andres Freund
On 2015-05-10 21:26:26 -0400, Robert Haas wrote:
 On Sun, May 10, 2015 at 8:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
   This commit reverts create_plan_recurse() as static function.
  Yes.  I am not convinced that external callers should be calling that,
  and would prefer not to enlarge createplan.c's API footprint without a
  demonstration that this is right and useful.  (This is one of many
  ways in which this patch is suffering from having gotten committed
  without submitted use-cases.)

Wasn't there a submitted use case? IIRC Kaigai had referenced some
pg-strom (?) code using it?

I'm failing to see how create_plan_recurse() being exposed externally is
related to having gotten committed without submitted use-cases.  Even
if submitted, presumably as simple as possible code, doesn't use it,
that's not a proof that less simple code does not need it.

 Your unwillingness to make functions global or to stick PGDLLIMPORT
 markings on variables that people want access to is hugely
 handicapping extension authors.  Many people have complained about
 that on multiple occasions.  Frankly, I find it obstructionist and
 petty.

While I don't find the tone of the characterization super helpful, I do
tend to agree that we're *far* too conservative on that end.  I've now
seen a significant number of extension that copied large swathes of code
just to cope with individual functions not being available.  And even
cases where that lead to minor forks with such details changed.

I know that I'm fearful of asking for functions being made
public. Because it'll invariably get into a discussion of merits that's
completely out of proportion with the size of the change.  And if I, who
has been on the list for a while now, am afraid in that way, you can
be sure that others won't even dare to ask, lest argue their way
through.

I think the problem is that during development the default often is to
create function as static if they're used only in one file. Which is
fine. But it really doesn't work if it's a larger battle to change
single incidences.  Besides the pain of having to wait for the next
major release...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] feature freeze and beta schedule

2015-05-10 Thread Andres Freund
On 2015-05-01 18:37:23 +0200, Andres Freund wrote:
 * Multivariate statistics
   This is not intended to be committed this CF.
   = I'd like to mark this as returned with (little) feedback.

 * Avoiding plan disasters with LIMIT
   I'm not enthused by the approach, it's disabled by default though. So
   it might not be too bad to just do it. Would probably have been a good
   idea to discuss the patch in a separate thread.
   = ?

 * Turning off HOT for larger SQL queries
   Seems to have degenerated into a discussion of not really related
   things. I personally would vote for committing something close to what
   Simon proposed last *directly at the beginning* of the next cycle.
   = Move?

 * Unique Joins
   This seems to require more work and came in pretty late
   = Returned with feedback.

 * INNER JOIN removals
   Seem far to controversial to consider comitting in 9.5.
   = Returned (or even rejected :()

 * Async execution of postgres_fdw.
   Later iterations of the patch haven't gotten much review yet. The
   original version of the patch is just from 2014-12-15.
   = Should imo be moved to the next CF

 
 * Allow snapshot too old error, to prevent bloat
   
 http://archives.postgresql.org/message-id/1361166406.1897609.1424371443904.JavaMail.yahoo%40mail.yahoo.com
   talked about a new version that afaics never materialized
   = Returned with feedback

 * Parallel Seq scan
   In my opinion the topic has progressed greatly. But at the same time
   it doesn't seem like it's in a state we should consider for 9.5.
   = Return?

 * logical column ordering (WIP)
   This pretty clearly isn't 9.5 material.
   = Return

 * Support ORDER BY in CREATE FUNCTION for Set Returning Functions
   Uhm. I think the outcome of the discussion so far wasn't really
   favorable to the idea s proposed.
   = Rejected

Marked as such.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

2015-05-10 Thread Andres Freund
On 2015-05-10 21:09:14 -0400, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  I'm not sure what exactly to use as a performance benchmark
  here. For now I chose
  SELECT * FROM (SELECT ARRAY(SELECT generate_series(1, 1))) d, 
  generate_series(1, 1000) repeat(i);
  that'll hit array_out, which uses iterators.
 
 Hmm, probably those results are swamped by I/O functions though.

I did check with a quick profile, and the iteration itself is a
significant part of the total execution time.

 I'd suggest trying something that exercises array_map(), which
 it looks like means doing an array coercion.  Perhaps like so:

 do $$
 declare a int4[];
 x int;
 begin
   a := array(select generate_series(1,1000));
   for i in 1..10 loop
 x := array_length(a::int8[], 1);
   end loop;
 end$$;

with the loop count set to 1 instead, I get:
before:
after:
tps = 20.940092 (including connections establishing)
after:
tps = 20.568730 (including connections establishing)

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Your unwillingness to make functions global or to stick PGDLLIMPORT
 markings on variables that people want access to is hugely
 handicapping extension authors.  Many people have complained about
 that on multiple occasions.  Frankly, I find it obstructionist and
 petty.

Sure, we could export every last static function in the core code,
and extension authors would rejoice ... while development on the core
code basically stops for fear of breaking extensions.  It's important
not to export things that we don't have to, especially when doing so
is really just a quick-n-dirty substitute for doing things properly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

2015-05-10 Thread Andres Freund
On 2015-05-10 12:09:41 -0400, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  Looking at this. First reading the patch to understand the details.
 
  * The VARTAG_IS_EXPANDED(tag) trick in VARTAG_SIZE is unlikely to
beneficial, before the compiler could implement the whole thing as a
computed goto or lookup table, afterwards not.
 
 Well, if you're worried about the speed of VARTAG_SIZE() then the right
 thing to do would be to revert your change that made enum vartag_external
 distinct from the size of the struct, so that we could go back to just
 using the second byte of a varattrib_1b_e datum as its size.  As I said
 at the time, inserting pad bytes to force each different type of toast
 pointer to be a different size would probably be a better tradeoff than
 what commit 3682025015 did.

I doubt that'd be a net positive. Anyway, all I'm saying is that I can't
see the VARTAG_IS_EXPANDED trick being beneficial in comparison to
checking both explicit values.

  * You were rather bothered by the potential of multiple evaluations for
the ilist stuff. And now the AARR macros are full of them...
 
 Yeah, there is doubtless some added cost there.  But I think it's a better
 answer than duplicating each function in toto; the code space that that
 would take isn't free either.

Yea, duplicating would be horrid. I'm more thinking of declaring some
iterator state outside the macro, or just using an inline function.

  * I find the ARRAY_ITER_VARS/ARRAY_ITER_NEXT macros rather ugly. I don't
buy the argument that turning them into functions will be slower. I'd
bet the contrary on common platforms.
 
 Perhaps; do you want to do some testing and see?

Not exactly with great joy, but I will.

  * The list of hardwired safe ops in exec_check_rw_parameter is somewhat
sad. Don't have a better idea though.
 
 It's very sad, and it will be high on my list to improve that in 9.6.


 But I do not think it's a fatal problem to ship it that way in 9.5,
 because *as things stand today* those are the only two functions that
 could benefit anyway.  It won't really matter until we have extensions
 that want to use this mechanism.

Agreed that it's not fatal.

  ISTM that the worst case for the new situation is large arrays that
  exist as plpgsql variables but are only ever passed on.
 
 Well, more to the point, large arrays that are forced into expanded format
 (costing a conversion step) but then we never do anything with them that
 would benefit from that.  Just saying they're passed on doesn't prove
 much since the called function might or might not get any benefit.
 array_length doesn't, but some other things would.

Right. But I'm not sure it's that uncommon.

 Your example with array_agg() is interesting, since one of the things on
 my to-do list is to see whether we could change array_agg to return an
 expanded array.

Well, I chose array_agg only because it was a trivial way to generate a
large array. The values could actually come from disk or something.

 It would not be hard to make it build that representation
 directly, instead of its present ad-hoc internal state.  The trick would
 be, when can you return the internal state without an additional copy
 step?  But maybe it could return a R/O pointer ...

R/O or R/W?

  ... Expanding only in
  cases where it'd be beneficial is going to be hard.
 
 Yeah, improving that heuristic looks like a research project.  Still, even
 with all the limitations and to-do items in the patch now, I'm pretty sure
 this will be a net win for practically all applications.

I wonder if we could somehow 'mark' other toast pointers as 'expand if
useful'. I.e. have something pretty much like ExpandedObjectHeader,
except that it initially works like the indirect toast stuff.  So
eoh_context is set, but the data is still in the original datum. When
accessed via 'plain' accessors that don't know about the expanded format
the pointed to datum is returned. But when accessed by something
desiring the expanded version it's expanded.  It seemed that'd be
doable expanding the new infrastructure a bit more.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multixacts woes

2015-05-10 Thread Jim Nasby

On 5/8/15 1:15 PM, Robert Haas wrote:

I somehow did not realize until very recently that we
actually use two SLRUs to keep track of multixacts: one for the
multixacts themselves (pg_multixacts/offsets) and one for the members
(pg_multixacts/members). Confusingly, members are sometimes called
offsets, and offsets are sometimes called IDs, or multixacts.


FWIW, since I had to re-read this bit...
 * We use two SLRU areas, one for storing the offsets at which the data
 * starts for each MultiXactId in the other one.  This trick allows us to
 * store variable length arrays of TransactionIds.


Another way this could be 'fixed' would be to bump MultiXactOffset (but 
NOT MultiXactId) to uint64. That would increase the number of total 
members we could keep by a factor of 2^32. At that point wraparound 
wouldn't even be possible, because you can't have more than 2^31 members 
in an MXID (and there can only be 2^31 MXIDs). It may not be a trivial 
change through, because SLRUs are currently capped at 2^32 pages.


This probably isn't a good long-term solution, but it would eliminate 
the risk of really frequent freeze vacuums. It sounds like Josh at least 
knows some people that could cause big problems for.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Kouhei Kaigai
 Kohei KaiGai kai...@kaigai.gr.jp writes:
  I briefly checked your updates.
  Even though it is not described in the commit-log, I noticed a
  problematic change.
 
  This commit reverts create_plan_recurse() as static function.
 
 Yes.  I am not convinced that external callers should be calling that,
 and would prefer not to enlarge createplan.c's API footprint without a
 demonstration that this is right and useful.  (This is one of many
 ways in which this patch is suffering from having gotten committed
 without submitted use-cases.)

Hmm. I got it is intentional change.

  It means extension
  cannot have child node, even if it wants to add a custom-join logic.
  Please assume a simple case below:
SELECT * FROM t0, t1 WHERE t0.a = t1.x;
 
  An extension adds a custom join path, then its PlanCustomPath method will be
  called back to create a plan node once it gets chosen by planner.
  The role of PlanCustomPath is to construct a plan-node of itself, and 
  plan-nodes
  of the source relations also.
  If create_plan_recurse() is static, we have no way to initialize
  plan-node for t0
  and t1 scan even if join-logic itself is powerful than built-in ones.
 
 I find this argument quite unconvincing, because even granting that this
 is an appropriate way to create child nodes of a CustomScan, there is a
 lot of core code besides createplan.c that would not know about those
 child nodes either.

 As a counterexample, suppose that your cool-new-join-method is capable of
 joining three inputs at once.  You could stick two of the children into
 lefttree and righttree perhaps, but where are you gonna put the other?

 This comes back to the fact that trying to wedge join behavior into scan
 node types was a pretty bad idea (as evidenced by the entirely bogus
 decision that now scanrelid can be zero, which I rather doubt you've found
 all the places that that broke).  Really there should have been a new
 CustomJoin node or something like that.  If we'd done that, it would be
 possible to design that node type more like Append, with any number of
 child nodes.  And we could have set things up so that createplan.c knows
 about those child nodes and takes care of the recursion for you; it would
 still not be a good idea to expose create_plan_recurse and hope that
 callers of that would know how to use it correctly.

 I am still more than half tempted to say we should revert this entire
 patch series and hope for a better design to be submitted for 9.6.
 In the meantime, though, poking random holes in the modularity of core
 code is a poor substitute for having designed a well-thought-out API.
 
 A possible compromise that we could perhaps still wedge into 9.5 is to
 extend CustomPath with a List of child Paths, and CustomScan with a List
 of child Plans, which createplan.c would know to build from the Paths,
 and other modules would then also be aware of these children.  I find that
 uglier than a separate join node type, but it would be tolerable I guess.

At this moment, my custom-join logic add a dummy node to have two child
nodes when it tries to join more than 3 relations.
Yep, if CustomPath node (ForeignPath also?) can have a list of child-path
nodes then core backend handles its initialization job, it will be more
comfortable for extensions.
I prefer this idea, rather than agree.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multixacts woes

2015-05-10 Thread Noah Misch
On Fri, May 08, 2015 at 02:15:44PM -0400, Robert Haas wrote:
 My colleague Thomas Munro and I have been working with Alvaro, and
 also with Kevin and Amit, to fix bug #12990, a multixact-related data
 corruption bug.

Thanks Alvaro, Amit, Kevin, Robert and Thomas for mobilizing to get this fixed.

 1. I believe that there is still a narrow race condition that cause
 the multixact code to go crazy and delete all of its data when
 operating very near the threshold for member space exhaustion. See
 http://www.postgresql.org/message-id/ca+tgmozihwybetx8nzzptosjprg2kcr-nawgajkzclcbvj1...@mail.gmail.com
 for the scenario and proposed fix.

For anyone else following along, Thomas's subsequent test verified this threat
beyond reasonable doubt:

http://www.postgresql.org/message-id/CAEepm=3c32vpjloo45y0c3-3kwxnv2xm4japtsvjcrd2vg0...@mail.gmail.com

 2. We have some logic that causes autovacuum to run in spite of
 autovacuum=off when wraparound threatens.  My commit
 53bb309d2d5a9432d2602c93ed18e58bd2924e15 provided most of the
 anti-wraparound protections for multixact members that exist for
 multixact IDs and for regular XIDs, but this remains an outstanding
 issue.  I believe I know how to fix this, and will work up an
 appropriate patch based on some of Thomas's earlier work.

That would be good to have, and its implementation should be self-contained.

 3. It seems to me that there is a danger that some users could see
 extremely frequent anti-mxid-member-wraparound vacuums as a result of
 this work.  Granted, that beats data corruption or errors, but it
 could still be pretty bad.  The default value of
 autovacuum_multixact_freeze_max_age is 4.
 Anti-mxid-member-wraparound vacuums kick in when you exceed 25% of the
 addressable space, or 1073741824 total members.  So, if your typical
 multixact has more than 1073741824/4 = ~2.68 members, you're
 going to see more autovacuum activity as a result of this change.
 We're effectively capping autovacuum_multixact_freeze_max_age at
 1073741824/(average size of your multixacts).  If your multixacts are
 just a couple of members (like 3 or 4) this is probably not such a big
 deal.  If your multixacts typically run to 50 or so members, your
 effective freeze age is going to drop from 400m to ~21.4m.  At that
 point, I think it's possible that relminmxid advancement might start
 to force full-table scans more often than would be required for
 relfrozenxid advancement.  If so, that may be a problem for some
 users.

I don't know whether this deserves prompt remediation, but if it does, I would
look no further than the hard-coded 25% figure.  We permit users to operate
close to XID wraparound design limits.  GUC maximums force an anti-wraparound
vacuum at no later than 93.1% of design capacity.  XID assignment warns at
99.5%, then stops at 99.95%.  PostgreSQL mandates a larger cushion for
pg_multixact/offsets, with anti-wraparound VACUUM by 46.6% and a stop at
50.0%.  Commit 53bb309d2d5a9432d2602c93ed18e58bd2924e15 introduced the
bulkiest mandatory cushion yet, an anti-wraparound vacuum when
pg_multixact/members is just 25% full.  The pgsql-bugs thread driving that
patch did reject making it GUC-controlled, essentially on the expectation that
25% should be adequate for everyone:

http://www.postgresql.org/message-id/CA+Tgmoap6-o_5ESu5X2mBRVht_F+KNoY+oO12OvV_WekSA=e...@mail.gmail.com
http://www.postgresql.org/message-id/20150506143418.gt2...@alvh.no-ip.org
http://www.postgresql.org/message-id/1570859840.1241196.1430928954257.javamail.ya...@mail.yahoo.com

 What can we do about this?  Alvaro proposed back-porting his fix for
 bug #8470, which avoids locking a row if a parent subtransaction
 already has the same lock.

Like Andres and yourself, I would not back-patch it.

 Another thought that occurs to me is that if we had a freeze map, it
 would radically decrease the severity of this problem, because
 freezing would become vastly cheaper.  I wonder if we ought to try to
 get that into 9.5, even if it means holding up 9.5.

Declaring that a release will wait for a particular feature has consistently
ended badly for PostgreSQL, and this feature is just in the planning stages.
If folks are ready to hit the ground running on the project, I suggest they do
so; a non-WIP submission to the first 9.6 CF would be a big accomplishment.
The time to contemplate slipping it into 9.5 comes after the patch is done.

If these aggressive ideas earn more than passing consideration, the 25%
threshold should become user-controllable after all.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multixacts woes

2015-05-10 Thread Andrew Dunstan


On 05/10/2015 10:30 AM, Robert Haas wrote:



2. We have some logic that causes autovacuum to run in spite of
autovacuum=off when wraparound threatens.  My commit
53bb309d2d5a9432d2602c93ed18e58bd2924e15 provided most of the
anti-wraparound protections for multixact members that exist for
multixact IDs and for regular XIDs, but this remains an outstanding
issue.  I believe I know how to fix this, and will work up an
appropriate patch based on some of Thomas's earlier work.

I believe autovacuum=off is fortunately uncommon, but certainly getting
this issue fixed is a good idea.

Right.





I suspect it's quite a bit more common than many people imagine.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multixacts woes

2015-05-10 Thread José Luis Tallón

On 05/08/2015 09:57 PM, Josh Berkus wrote:

[snip]

It's certainly possible to have workloads triggering that, but I think
it's relatively uncommon.  I in most cases I've checked the multixact
consumption rate is much lower than the xid consumption. There are some
exceptions, but often that's pretty bad code.

I have a couple workloads in my pool which do consume mxids faster than
xids, due to (I think) execeptional numbers of FK conflicts.  It's
definitely unusual, though, and I'm sure they'd rather have corruption
protection and endure some more vacuums.


Seen corruption happen recently with OpenBravo on PostgreSQL 9.3.6 
(Debian; binaries upgraded from 9.3.2) in a cluster pg_upgraded from 9.2.4

(albeit with quite insufficient autovacuum / poorly configured Postgres)

I fear that this might be more widespread than we thought, depending on 
the exact workload/activity pattern.
If it would help, I can try to get hold of a copy of the cluster in 
question (if the customer keeps any copy at all)



If we do this, though, it
might be worthwhile to backport the multixact age function, so that
affected users can check and schedule mxact wraparound vacuums
themselves, something you currently can't do on 9.3.


Thanks,

J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Robert Haas
On Sat, May 9, 2015 at 1:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I originally wanted to go quite the other way with this and check for
 join pushdown via handler X any time at least one of the two relations
 involved used handler X, even if the other one used some other handler
 or was a plain table.  In particular, it seems to me quite plausible
 to want to teach an FDW that a certain local table is replicated on a
 remote node, allowing a join between a foreign table and a plain table
 to be pushed down.

 If we did do something like that, I think a saner implementation would
 involve substituting a foreign table for the local one earlier, via view
 expansion.  So by the time we are doing join planning, there would be no
 need to consider cross-server joins anyway.

Huh?  You can't do this at rewrite time; it's very fundamentally a
planning problem.  Suppose the user wants to join A, B, and C, where A
is a plain table, B is a plain table that is replicated on a foreign
server, and C is a foreign table on that same foreign server.  If we
decide to join B to C first, we probably want to push down the join,
although maybe not, if we estimate that B JOIN C will have more rows
than C.  If we decide to join A to B first, we want to use the local
copy of B.

 This infrastructure can't be used that way anyhow,
 so maybe there's no harm in tightening it up, but I'm wary of
 circumscribing what FDW authors can do.

 Somebody who's really intent on shooting themselves in the foot can always
 use the set_join_pathlist_hook to inject paths for arbitrary joins.
 The FDW mechanism should support reasonable use cases without undue
 complication, and I doubt that what we've got now is adding anything
 except complication and risk of bugs.

 For the archives' sake, let me lay out a couple of reasons why an FDW
 that tried to allow cross-server joins would almost certainly be broken,
 and broken in security-relevant ways.  Suppose for instance that
 postgres_fdw tried to be smart and drill down into foreign tables' server
 IDs to allow joining of any two tables that have the same effective host
 name, port, database name, user name, and anything else you think would be
 relevant to its choice of connections.  The trouble with that is that the
 user mapping is context dependent, in particular one local userID might
 map to the same remote user name for two different server OIDs, while
 another might map to different user names.  So if we plan a query under
 the first userID we might decide it's okay to do the join remotely.
 Then, if we re-use that plan while executing as another userID (which is
 entirely possible) what will probably happen is that the remote join
 query will get sent off under one or the other of the remote usernames
 associated with the second local userID.  This could lead to either a
 permission failure, or a remote table access that should not be allowed
 to the current local userID.  Admittedly, such cases might be rare in
 practice, but it's still a security hole.  Also, even if the FDW is
 defensive enough to recheck full matching of the tables' connection
 properties at execution time, there's not much it could do about the
 situation except fail; it couldn't cause a re-plan to occur.

 For another case, we do not have any mechanism whereby operations like
 ALTER SERVER OPTIONS could invalidate existing plans.  Thus, even if
 the two tables' connection properties matched at plan time, there's no
 guarantee that they still match at execution.  This is probably not a
 security hole (at least not if you assume foreign-server owners are
 trusted users), but it's still a bug that exists only if someone tries
 to allow cross-server joins.

 For these reasons, I think that if an FDW tried to be laxer than tables
 must be on the same pg_foreign_server entry to be joined, the odds
 approach unity that it would be broken, and probably dangerously broken.
 So we should just make that check for the FDWs.  Anybody who thinks
 they're smarter than the average bear can use set_join_pathlist_hook,
 but they are probably wrong.

Drilling down into postgres_fdw's connection properties seems pretty
silly; the user isn't likely to create two SERVER objects that are
identical and then choose which one to use at random, and if they do,
they deserve what they get.  The universe of FDWs, however, is
potentially bigger than that.  What does a server even mean for
file_fdw, for example?  I can't think of any reason why somebody would
want to implement joins inside file_fdw, but if they did, all the
things being joined would be local files, so the server ID doesn't
really matter.  Now you may say that's a silly use case, but it's less
obviously silly if the files contain structured data, as with
cstore_fdw, yet the server ID could still be not especially relevant.
Maybe you've got servers representing filesystem directories; that
shouldn't preclude cross server joins.

-- 
Robert Haas
EnterpriseDB: 

Re: [HACKERS] multixacts woes

2015-05-10 Thread Robert Haas
On Fri, May 8, 2015 at 5:39 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 1. I believe that there is still a narrow race condition that cause
 the multixact code to go crazy and delete all of its data when
 operating very near the threshold for member space exhaustion. See
 http://www.postgresql.org/message-id/ca+tgmozihwybetx8nzzptosjprg2kcr-nawgajkzclcbvj1...@mail.gmail.com
 for the scenario and proposed fix.

 I agree that there is a problem here.

OK, I'm glad we now agree on that, since it seemed like you were
initially unconvinced.

 2. We have some logic that causes autovacuum to run in spite of
 autovacuum=off when wraparound threatens.  My commit
 53bb309d2d5a9432d2602c93ed18e58bd2924e15 provided most of the
 anti-wraparound protections for multixact members that exist for
 multixact IDs and for regular XIDs, but this remains an outstanding
 issue.  I believe I know how to fix this, and will work up an
 appropriate patch based on some of Thomas's earlier work.

 I believe autovacuum=off is fortunately uncommon, but certainly getting
 this issue fixed is a good idea.

Right.

 3. It seems to me that there is a danger that some users could see
 extremely frequent anti-mxid-member-wraparound vacuums as a result of
 this work.

 I agree with the idea that the long term solution to this issue is to
 make the freeze process cheaper.  I don't have any good ideas on how to
 make this less severe in the interim.  You say the fix for #8470 is not
 tested thoroughly enough to back-patch it just yet, and I can behind
 that; so let's wait until 9.5 has been tested a bit more.

Sounds good.

 Another avenue not mentioned and possibly worth exploring is making some
 more use of the multixact cache, and reuse multixacts that were
 previously issued and have the same effects as the one you're interested
 in: for instance, if you want a multixact with locking members
 (10,20,30) and you have one for (5,10,20,30) but transaction 5 has
 finished, then essentially both have the same semantics (because locks
 don't have any effect the transaction has finished) so we can use it
 instead of creating a new one.  I have no idea how to implement this;
 obviously, having to run TransactionIdIsCurrentTransactionId for each
 member on each multixact in the cache each time you want to create a new
 multixact is not very reasonable.

This sounds to me like it's probably too clever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN range operator class

2015-05-10 Thread Emre Hasegeli
 I pushed patches 04 and 07, as well as adopting some of the changes to
 the regression test in 06.  I'm afraid I caused a bit of merge pain for
 you -- sorry about that.

No problem.  I rebased the remaining ones.


brin-inclusion-v09-02-strategy-numbers.patch
Description: Binary data


brin-inclusion-v09-03-remove-assert-checking.patch
Description: Binary data


brin-inclusion-v09-05-box-vs-point-operators.patch
Description: Binary data


brin-inclusion-v09-06-inclusion-opclasses.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

2015-05-10 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 Looking at this. First reading the patch to understand the details.

 * The VARTAG_IS_EXPANDED(tag) trick in VARTAG_SIZE is unlikely to
   beneficial, before the compiler could implement the whole thing as a
   computed goto or lookup table, afterwards not.

Well, if you're worried about the speed of VARTAG_SIZE() then the right
thing to do would be to revert your change that made enum vartag_external
distinct from the size of the struct, so that we could go back to just
using the second byte of a varattrib_1b_e datum as its size.  As I said
at the time, inserting pad bytes to force each different type of toast
pointer to be a different size would probably be a better tradeoff than
what commit 3682025015 did.

 * It'd be nice if the get_flat_size comment in expandeddatm.h could
   specify whether the header size is included. That varies enough around
   toast that it seems worthwhile.

OK.

 * You were rather bothered by the potential of multiple evaluations for
   the ilist stuff. And now the AARR macros are full of them...

Yeah, there is doubtless some added cost there.  But I think it's a better
answer than duplicating each function in toto; the code space that that
would take isn't free either.

 * I find the ARRAY_ITER_VARS/ARRAY_ITER_NEXT macros rather ugly. I don't
   buy the argument that turning them into functions will be slower. I'd
   bet the contrary on common platforms.

Perhaps; do you want to do some testing and see?

 * Not a fan of the EH_ prefix in array_expanded.c and EOH_
   elsewhere. Just looks ugly to me. Whatever.

I'm not wedded to that naming if you have a better suggestion.

 * The list of hardwired safe ops in exec_check_rw_parameter is somewhat
   sad. Don't have a better idea though.

It's very sad, and it will be high on my list to improve that in 9.6.
But I do not think it's a fatal problem to ship it that way in 9.5,
because *as things stand today* those are the only two functions that
could benefit anyway.  It won't really matter until we have extensions
that want to use this mechanism.

 * Also, a C function that is modifying a read-write expanded value
   in-place should take care to leave the value in a sane state if it
   fails partway through. - that's a pretty hefty requirement imo.

It is, which is one reason that I'm nervous about relaxing the test in
exec_check_rw_parameter.  Still, it was possible to code array_set_element
to meet that restriction without too much pain.  I'm inclined to leave the
stronger requirement in the docs for now, until we get more push-back.

 * The forced RW-RO conversion in subquery scans is a bit sad, but I
   seems like something left for later.

Yes, there are definitely some things that look like future opportunities
here.

 Somewhere in the thread you comment on the fact that it's a bit sad that
 plpgsql is the sole benefactor of this (unless some function forces
 expansion internally).  I'd be ok to leave it at that for now. It'd be
 quite cool to get some feedback from postgis folks about the suitability
 of this for their cases.

Indeed, that's the main reason I'm eager to ship something in 9.5, even if
it's not perfect.  I hope those guys will look at it and use it, and maybe
give us feedback on ways to improve it.

 ISTM that the worst case for the new situation is large arrays that
 exist as plpgsql variables but are only ever passed on.

Well, more to the point, large arrays that are forced into expanded format
(costing a conversion step) but then we never do anything with them that
would benefit from that.  Just saying they're passed on doesn't prove
much since the called function might or might not get any benefit.
array_length doesn't, but some other things would.

Your example with array_agg() is interesting, since one of the things on
my to-do list is to see whether we could change array_agg to return an
expanded array.  It would not be hard to make it build that representation
directly, instead of its present ad-hoc internal state.  The trick would
be, when can you return the internal state without an additional copy
step?  But maybe it could return a R/O pointer ...

 ... Expanding only in
 cases where it'd be beneficial is going to be hard.

Yeah, improving that heuristic looks like a research project.  Still, even
with all the limitations and to-do items in the patch now, I'm pretty sure
this will be a net win for practically all applications.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, May 9, 2015 at 1:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 For these reasons, I think that if an FDW tried to be laxer than tables
 must be on the same pg_foreign_server entry to be joined, the odds
 approach unity that it would be broken, and probably dangerously broken.
 So we should just make that check for the FDWs.  Anybody who thinks
 they're smarter than the average bear can use set_join_pathlist_hook,
 but they are probably wrong.

 Drilling down into postgres_fdw's connection properties seems pretty
 silly; the user isn't likely to create two SERVER objects that are
 identical and then choose which one to use at random, and if they do,
 they deserve what they get.  The universe of FDWs, however, is
 potentially bigger than that.  What does a server even mean for
 file_fdw, for example?

Nothing, which is why you'd only ever create one per database, and so the
issue doesn't arise anyway.  It would only be sane to create multiple
servers per FDW if there were a meaningful difference between them.

In any case, since the existing code doesn't support remote joins
involving a local table unless you use the join-path hook, this argument
seems pretty academic.  If we tighten the test to be same-server, we will
benefit all but very weirdly designed FDWs.  Anybody who's not happy with
that can still use the hook (and I continue to maintain that they will
probably have subtle bugs, but whatever).

Another point worth making is that the coding I have in mind doesn't
really do anything with RelOptInfo.serverid except compare it for
equality.  So an FDW that wants to consider some servers interchangeable
for joining purposes could override the value at GetForeignPaths time
(ie replace serverid with the OID of a preferred server), and then it
would get GetForeignJoinPaths calls as desired.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Tom Lane
I've committed a cleanup patch along the lines discussed.

One thought is that we could test the nondefault-scan-tuple logic without
a lot of work by modifying the way postgres_fdw deals with columns it
decides don't need to be fetched.  Right now it injects NULL columns so
that the remote query results still match the foreign table's rowtype,
but that's not especially desirable or efficient.  What we could do
instead is generate an fdw_scan_tlist that just lists the columns we
intend to fetch.

I don't have time to pursue this idea right now, but I think it would be
a good change to squeeze into 9.5, just so that we could have some test
coverage on those parts of this patch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #13148: Unexpected deferred EXCLUDE constraint violation on derived table

2015-05-10 Thread Tom Lane
postgres...@realityexists.net writes:
 I have a deferred EXCLUDE constraint on a derived table. Inside a
 transaction I insert a new row that conflicts with an existing one (so the
 constraint would fail if it was immediate), delete the old row and run an
 unrelated UPDATE on the new row, then try to commit. I would expect the
 commit to succeed, since there is now no conflict, but it fails with
   ERROR: conflicting key value violates exclusion constraint
 uq_derived_timeslice_dup_time_ex

Hm.  The given test case is overcomplicated; in point of fact it will fail
on any deferred exclusion constraint, eg


DROP TABLE IF EXISTS derived_timeslice CASCADE;

CREATE TABLE derived_timeslice
(
  timeslice_id integer NOT NULL,
  feature_id integer NOT NULL,
  name text,
  CONSTRAINT pk_derived_timeslice PRIMARY KEY (timeslice_id),
  CONSTRAINT uq_derived_timeslice_dup_time_ex EXCLUDE 
USING btree (feature_id WITH =)
DEFERRABLE INITIALLY DEFERRED
);

INSERT INTO derived_timeslice (timeslice_id, feature_id) VALUES (51, 1);

BEGIN;

-- Insert row that violates deferred constraint
INSERT INTO derived_timeslice (timeslice_id, feature_id) VALUES (52, 1);

-- Delete the old row - now there should be no more conflict
DELETE FROM derived_timeslice WHERE timeslice_id = 51;

-- Problem doesn't occur without an UPDATE statement
UPDATE derived_timeslice SET name = 'Updated' WHERE timeslice_id = 52;

-- This confirms there is only 1 row - no conflict
SELECT * FROM derived_timeslice;

COMMIT; -- Enforce constraint - error occurs here


The cause of the problem seems to be that the UPDATE performs a HOT update
of the new tuple, leaving in this case a dead tuple at (0,2) that is HOT
updated by (0,3).  When unique_key_recheck() is invoked for (0,2), it
believes, correctly, that it has to perform the recheck anyway ... but it
tells check_exclusion_constraint that the check is being performed for
(0,2).  So the index search inside check_exclusion_constraint finds the
live tuple at (0,3) and thinks that is a conflict.

This is reproducible clear back to 9.0 where exclusion constraints were
added.

The easiest fix seems to be to pass the HOT child's TID instead of the
TID we were called for.  (Note that the other path, for a regular unique
constraint, is correct as-is because the original TID is what the index
will know about.)

The attached patch seems to fix the problem without breaking any existing
regression tests, but I wonder if anyone can see a hole in it.

regards, tom lane

diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index e49affb..28fccaf 100644
*** a/src/backend/commands/constraint.c
--- b/src/backend/commands/constraint.c
*** unique_key_recheck(PG_FUNCTION_ARGS)
*** 89,97 
  	 * because this trigger gets queued only in response to index insertions;
  	 * which means it does not get queued for HOT updates.  The row we are
  	 * called for might now be dead, but have a live HOT child, in which case
! 	 * we still need to make the check.  Therefore we have to use
! 	 * heap_hot_search, not just HeapTupleSatisfiesVisibility as is done in
! 	 * the comparable test in RI_FKey_check.
  	 *
  	 * This might look like just an optimization, because the index AM will
  	 * make this identical test before throwing an error.  But it's actually
--- 89,98 
  	 * because this trigger gets queued only in response to index insertions;
  	 * which means it does not get queued for HOT updates.  The row we are
  	 * called for might now be dead, but have a live HOT child, in which case
! 	 * we still need to make the check --- effectively, we're applying the
! 	 * check against the live child row, although we can use the values from
! 	 * this row since by definition all columns of interest to us are the
! 	 * same.
  	 *
  	 * This might look like just an optimization, because the index AM will
  	 * make this identical test before throwing an error.  But it's actually
*** unique_key_recheck(PG_FUNCTION_ARGS)
*** 159,165 
  	{
  		/*
  		 * Note: this is not a real insert; it is a check that the index entry
! 		 * that has already been inserted is unique.
  		 */
  		index_insert(indexRel, values, isnull, (new_row-t_self),
  	 trigdata-tg_relation, UNIQUE_CHECK_EXISTING);
--- 160,168 
  	{
  		/*
  		 * Note: this is not a real insert; it is a check that the index entry
! 		 * that has already been inserted is unique.  Passing t_self is
! 		 * correct even if t_self is now dead, because that is the TID the
! 		 * index will know about.
  		 */
  		index_insert(indexRel, values, isnull, (new_row-t_self),
  	 trigdata-tg_relation, UNIQUE_CHECK_EXISTING);
*** unique_key_recheck(PG_FUNCTION_ARGS)
*** 168,177 
  	{
  		/*
  		 * For exclusion constraints we just do the normal check, but now it's
! 		 * okay to throw error.
  		 */
  		check_exclusion_constraint(trigdata-tg_relation, indexRel, indexInfo,