Re: [HACKERS] LATERAL quals revisited

2013-08-19 Thread Tom Lane
Some time ago, I wrote:
 I've been studying the bug reported at
 http://www.postgresql.org/message-id/20130617235236.GA1636@jeremyevans.local
 ...
 After some contemplation, I think that the most practical way to fix
 this is for deconstruct_recurse and distribute_qual_to_rels to
 effectively move such a qual to the place where it logically belongs;
 that is, rather than processing it when we look at the lower WHERE
 clause, set it aside for a moment and then add it back when looking at
 the ON clause of the appropriate outer join.  This should be reasonably
 easy to do by keeping a list of postponed lateral clauses while we're
 scanning the join tree.

Here's a draft patch for this.  The comments need a bit more work
probably, but barring objection I want to push this in before this
afternoon's 9.3rc1 wrap.

regards, tom lane

diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 98f601c..e055088 100644
*** a/src/backend/optimizer/plan/initsplan.c
--- b/src/backend/optimizer/plan/initsplan.c
*** int			from_collapse_limit;
*** 36,47 
  int			join_collapse_limit;
  
  
  static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
  		   Index rtindex);
  static void add_lateral_info(PlannerInfo *root, Relids lhs, Relids rhs);
  static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
  	bool below_outer_join,
! 	Relids *qualscope, Relids *inner_join_rels);
  static SpecialJoinInfo *make_outerjoininfo(PlannerInfo *root,
     Relids left_rels, Relids right_rels,
     Relids inner_join_rels,
--- 36,56 
  int			join_collapse_limit;
  
  
+ /* Elements of the postponed_qual_list used during deconstruct_recurse */
+ typedef struct PostponedQual
+ {
+ 	Node	   *qual;			/* a qual clause waiting to be processed */
+ 	Relids		relids;			/* the set of baserels it references */
+ } PostponedQual;
+ 
+ 
  static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
  		   Index rtindex);
  static void add_lateral_info(PlannerInfo *root, Relids lhs, Relids rhs);
  static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
  	bool below_outer_join,
! 	Relids *qualscope, Relids *inner_join_rels,
! 	List **postponed_qual_list);
  static SpecialJoinInfo *make_outerjoininfo(PlannerInfo *root,
     Relids left_rels, Relids right_rels,
     Relids inner_join_rels,
*** static void distribute_qual_to_rels(Plan
*** 53,59 
  		Relids qualscope,
  		Relids ojscope,
  		Relids outerjoin_nonnullable,
! 		Relids deduced_nullable_relids);
  static bool check_outerjoin_delay(PlannerInfo *root, Relids *relids_p,
  	  Relids *nullable_relids_p, bool is_pushed_down);
  static bool check_equivalence_delay(PlannerInfo *root,
--- 62,69 
  		Relids qualscope,
  		Relids ojscope,
  		Relids outerjoin_nonnullable,
! 		Relids deduced_nullable_relids,
! 		List **postponed_qual_list);
  static bool check_outerjoin_delay(PlannerInfo *root, Relids *relids_p,
  	  Relids *nullable_relids_p, bool is_pushed_down);
  static bool check_equivalence_delay(PlannerInfo *root,
*** add_lateral_info(PlannerInfo *root, Reli
*** 630,644 
  List *
  deconstruct_jointree(PlannerInfo *root)
  {
  	Relids		qualscope;
  	Relids		inner_join_rels;
  
  	/* Start recursion at top of jointree */
  	Assert(root-parse-jointree != NULL 
  		   IsA(root-parse-jointree, FromExpr));
  
! 	return deconstruct_recurse(root, (Node *) root-parse-jointree, false,
! 			   qualscope, inner_join_rels);
  }
  
  /*
--- 640,662 
  List *
  deconstruct_jointree(PlannerInfo *root)
  {
+ 	List	   *result;
  	Relids		qualscope;
  	Relids		inner_join_rels;
+ 	List	   *postponed_qual_list = NIL;
  
  	/* Start recursion at top of jointree */
  	Assert(root-parse-jointree != NULL 
  		   IsA(root-parse-jointree, FromExpr));
  
! 	result = deconstruct_recurse(root, (Node *) root-parse-jointree, false,
!  qualscope, inner_join_rels,
!  postponed_qual_list);
! 
! 	/* Shouldn't be any leftover quals */
! 	Assert(postponed_qual_list == NIL);
! 
! 	return result;
  }
  
  /*
*** deconstruct_jointree(PlannerInfo *root)
*** 656,668 
   *	*inner_join_rels gets the set of base Relids syntactically included in
   *		inner joins appearing at or below this jointree node (do not modify
   *		or free this, either)
   *	Return value is the appropriate joinlist for this jointree node
   *
   * In addition, entries will be added to root-join_info_list for outer joins.
   */
  static List *
  deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join,
! 	Relids *qualscope, Relids *inner_join_rels)
  {
  	List	   *joinlist;
  
--- 674,689 
   *	*inner_join_rels gets the set of base Relids syntactically included in
   *		inner joins appearing at or below this jointree node (do not modify
   *		or 

Re: [HACKERS] LATERAL quals revisited

2013-08-14 Thread Tom Lane
Antonin Houska antonin.hou...@gmail.com writes:
 On 07/04/2013 06:11 PM, Antonin Houska wrote:
 On 07/03/2013 08:32 PM, Tom Lane wrote:
 Another possibility would be to keep the optimization, but disable it in
 queries that use LATERAL.  I don't much care for that though --- seems
 too Rube Goldbergish, and in any case I have a lot less faith in the
 whole concept now than I had before I started digging into this issue.

 I constructed a query that triggers the optimization - see attachment 
 with comments.

Thanks for poking at this.

 EXPLAIN shows the same plan with or without the ph_may_need 
 optimization, but that might be data problem (my tables are empty).

Yeah, I didn't have much luck getting a different plan even with data in
the tables.  What you'd need for this to be important would be for a join
order that's precluded without the ph_may_need logic to be significantly
better than the join orders that are still allowed.  While that's
certainly within the realm of possibility, the difficulty of triggering
the case at all reinforces my feeling that this optimization isn't worth
bothering with.  For the moment I'm just going to take it out.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL quals revisited

2013-07-19 Thread Ashutosh Bapat
I have couple of questions.

On Wed, Jun 26, 2013 at 1:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 I've been studying the bug reported at

 http://www.postgresql.org/message-id/20130617235236.GA1636@jeremyevans.local
 that the planner can do the wrong thing with queries like

 SELECT * FROM
   i LEFT JOIN LATERAL (SELECT * FROM j WHERE i.n = j.n) j ON true;

 I think the fundamental problem is that, because the i.n = j.n clause
 appears syntactically in WHERE, the planner is treating it as if it were
 an inner-join clause; but really it ought to be considered a clause of
 the upper LEFT JOIN.  That is, semantically this query ought to be
 equivalent to

 SELECT * FROM
   i LEFT JOIN LATERAL (SELECT * FROM j) j ON i.n = j.n;

 However, because distribute_qual_to_rels doesn't see the clause as being
 attached to the outer join, it's not marked with the correct properties
 and ends up getting evaluated in the wrong place (as a filter clause
 not a join filter clause).  The bug is masked in the test cases we've
 used so far because those cases are designed to let the clause get
 pushed down into the scan of the inner relation --- but if it doesn't
 get pushed down, it's evaluated the wrong way.

 After some contemplation, I think that the most practical way to fix
 this is for deconstruct_recurse and distribute_qual_to_rels to
 effectively move such a qual to the place where it logically belongs;
 that is, rather than processing it when we look at the lower WHERE
 clause, set it aside for a moment and then add it back when looking at
 the ON clause of the appropriate outer join.  This should be reasonably
 easy to do by keeping a list of postponed lateral clauses while we're
 scanning the join tree.

 For there to *be* a unique appropriate outer join, we need to require
 that a LATERAL-using qual clause that's under an outer join contain
 lateral references only to the outer side of the nearest enclosing outer
 join.  There's no such restriction in the spec of course, but we can
 make it so by refusing to flatten a sub-select if pulling it up would
 result in having a clause in the outer query that violates this rule.
 There's already some code in prepjointree.c (around line 1300) that
 attempts to enforce this, though now that I look at it again I'm not
 sure it's covering all the bases.  We may need to extend that check.


Why do we need this restriction? Wouldn't a place (specifically join qual
at such a place) in join tree where all the participating relations are
present, serve as a place where the clause can be applied. E.g. in the query

select * from tab1 left join tab2 t2 using (val) left join lateral (select
val from tab2 where val2 = tab1.val * t2.val) t3 using (val);

Can't we apply (as a join qual) the qual val2 = tab1.val * t2.val at a
place where we are computing join between tab1, t2 and t3?


 I'm inclined to process all LATERAL-using qual clauses this way, ie
 postpone them till we recurse back up to a place where they can
 logically be evaluated.  That won't make any real difference when no
 outer joins are present, but it will eliminate the ugliness that right
 now distribute_qual_to_rels is prevented from sanity-checking the scope
 of the references in a qual when LATERAL is present.  If we do it like
 this, we can resurrect full enforcement of that sanity check, and then
 throw an error if any postponed quals are left over when we're done
 recursing.


Parameterized nested loop join would always be able to evaluate a LATERAL
query. Instead of throwing error, why can't we choose that as the default
strategy whenever we fail to flatten subquery?

Can we put the clause with lateral references at its appropriate place
while flattening the subquery? IMO, that will be cleaner and lesser work
than first pulling the clause and then putting it back again? Right, now,
we do not have that capability in pull_up_subqueries() but given its
recursive structure, it might be easier to do it there.


 Thoughts, better ideas?

 regards, tom lane


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company


Re: [HACKERS] LATERAL quals revisited

2013-07-19 Thread Tom Lane
Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes:
 On Wed, Jun 26, 2013 at 1:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 For there to *be* a unique appropriate outer join, we need to require
 that a LATERAL-using qual clause that's under an outer join contain
 lateral references only to the outer side of the nearest enclosing outer
 join.  There's no such restriction in the spec of course, but we can
 make it so by refusing to flatten a sub-select if pulling it up would
 result in having a clause in the outer query that violates this rule.
 There's already some code in prepjointree.c (around line 1300) that
 attempts to enforce this, though now that I look at it again I'm not
 sure it's covering all the bases.  We may need to extend that check.

 Why do we need this restriction? Wouldn't a place (specifically join qual
 at such a place) in join tree where all the participating relations are
 present, serve as a place where the clause can be applied.

No.  If you hoist a qual that appears below an outer join to above the
outer join, you get wrong results in general: you might eliminate rows
from the outer side of the join, which a qual from within the inner side
should never be able to do.

 select * from tab1 left join tab2 t2 using (val) left join lateral (select
 val from tab2 where val2 = tab1.val * t2.val) t3 using (val);
 Can't we apply (as a join qual) the qual val2 = tab1.val * t2.val at a
 place where we are computing join between tab1, t2 and t3?

This particular example doesn't violate the rule I gave above, since
both tab1 and t2 are on the left side of the join to the lateral
subquery, and the qual doesn't have to get hoisted *past* an outer join,
only to the outer join of {tab1,t2} with {t3}.

 I'm inclined to process all LATERAL-using qual clauses this way, ie
 postpone them till we recurse back up to a place where they can
 logically be evaluated.  That won't make any real difference when no
 outer joins are present, but it will eliminate the ugliness that right
 now distribute_qual_to_rels is prevented from sanity-checking the scope
 of the references in a qual when LATERAL is present.  If we do it like
 this, we can resurrect full enforcement of that sanity check, and then
 throw an error if any postponed quals are left over when we're done
 recursing.

 Parameterized nested loop join would always be able to evaluate a LATERAL
 query. Instead of throwing error, why can't we choose that as the default
 strategy whenever we fail to flatten subquery?

I think you misunderstood.  That error would only be a sanity check that
we'd accounted for all qual clauses, it's not something a user should
ever see.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL quals revisited

2013-07-12 Thread Antonin Houska

On 07/04/2013 06:11 PM, Antonin Houska wrote:

On 07/03/2013 08:32 PM, Tom Lane wrote:

Another possibility would be to keep the optimization, but disable it in
queries that use LATERAL.  I don't much care for that though --- seems
too Rube Goldbergish, and in any case I have a lot less faith in the
whole concept now than I had before I started digging into this issue.

Thoughts?

I noticed EXPLAIN in some regression tests. So if they all pass after 
removal of this optimization, it might indicate that it was really 
insignificant. But alternatively it may just be a lack of focus on 
this feature in the test queries. Digging for (non-LATERAL) queries or 
rather patterns where the ph_may_need optimization clearly appears to 
be important sounds to me like a good SQL exercise, but I'm afraid I 
won't have time for it in the next few days.




I constructed a query that triggers the optimization - see attachment 
with comments. (Note that the relid sets are derived from my current 
knowledge of the logic. I haven't figured out how to check them easily 
in gdb session.)


The intention was that the top-level OJ references LHS of the join below 
rather than the RHS. That should increase the likelihood that the PHV 
becomes the only obstacle for join commuting. And therefore the 
ph_may_need optimization should unblock some combinations that would be 
impossible otherwise.


However I could not see the condition

  if (bms_is_subset(phinfo-ph_may_need, min_righthand))
  continue;

met for the top-level join even though the supposed ph_may_need did not 
contain tab1. Then it struck me that min_righthand can be the problem. 
So I changed the join clause to reference RHS of j1, hoping that it 
should make min_righthand bigger. And that really triggered the condition.


EXPLAIN shows the same plan with or without the ph_may_need 
optimization, but that might be data problem (my tables are empty).


More important is the fact that I could only avoid addition of the PHV's 
eval_at to min_righthand at the cost of adding the whole j1 join (i.e. 
more than just eval_at).


Although the idea behind ph_may_need is clever, I can now imagine that 
other techniques of the planner can substitute for it. There might be 
examples showing the opposite but such are beyond my imagination.


// Antonin Houska (Tony)

SELECT  tab1.i
FROMtab1
-- The ph_may_need optimization should be effective for the
	-- top-level LEFT JOIN. The PHV in sub1 is only referenced below it.
LEFT JOIN
	(  tab2
   LEFT JOIN
   ( tab3
 LEFT JOIN
 ( SELECT
   -- This expression should be wrapped in the PHV
   -- That PHV should have eval_at = {tab4, tab5}.
   -- 
   -- Join clause of j1 is the highest reference to the PHV.
   -- Thus ph_may_need should be {tab2, tab3, tab4, tab5}. Therefore
   -- the ph_may_need optimization should avoid addition of eval_at
   -- to min_righthand of the top-level join's SpecialJoinInfo.
		   COALESCE(tab4.l, tab5.m, 1) AS x
 FROM  tab4
   LEFT JOIN
   tab5
   ON l = m
 ) AS sub1(x)
 ON tab3.k = sub1.x
   ) AS j2(k, x) 
	   ON tab2.j = j2.x
 ) AS j1(j, k)
 -- This clause references j.k (RHS of the lower join) to keep min_righthand
 -- of the top-level join bigger (ph_may_need needs to be its subset).
	 ON tab1.i = j1.k;
CREATE TABLE tab1(i int);
CREATE TABLE tab2(j int);
CREATE TABLE tab3(k int);
CREATE TABLE tab4(l int);
CREATE TABLE tab5(m int);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL quals revisited

2013-07-04 Thread Antonin Houska

On 07/03/2013 08:32 PM, Tom Lane wrote:

Another possibility would be to keep the optimization, but disable it in
queries that use LATERAL.  I don't much care for that though --- seems
too Rube Goldbergish, and in any case I have a lot less faith in the
whole concept now than I had before I started digging into this issue.

Thoughts?

I noticed EXPLAIN in some regression tests. So if they all pass after 
removal of this optimization, it might indicate that it was really 
insignificant. But alternatively it may just be a lack of focus on this 
feature in the test queries. Digging for (non-LATERAL) queries or rather 
patterns where the ph_may_need optimization clearly appears to be 
important sounds to me like a good SQL exercise, but I'm afraid I won't 
have time for it in the next few days.



//Antonin Houska (Tony)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL quals revisited

2013-07-03 Thread Tom Lane
I wrote:
 So attached is a draft patch for this.  It's not complete yet because
 there are various comments that are now wrong and need to be updated;
 but I think the code is functioning correctly.

Hm, spoke too soon :-(.  This query causes an assertion failure, with or
without my draft patch:

select c.*,a.*,ss1.q1,ss2.q1,ss3.* from
  int8_tbl c left join (
int8_tbl a left join
  (select q1, coalesce(q2,f1) as x from int8_tbl b, int4_tbl b2) ss1
  on a.q2 = ss1.q1
cross join
lateral (select q1, coalesce(ss1.x,q2) as y from int8_tbl d) ss2
  ) on c.q2 = ss2.q1,
  lateral (select * from int4_tbl i where ss2.y  f1) ss3;

TRAP: FailedAssertion(!(bms_is_subset(phinfo-ph_needed, 
phinfo-ph_may_need)), File: initsplan.c, Line: 213)

What's happening is that distribute_qual_to_rels concludes (correctly)
that the ss2.y  f1 clause must be postponed until after the nest of
left joins, since those could null ss2.y.  So the PlaceHolderVar for
ss2.y is marked as being needed at the topmost join level.  However,
find_placeholders_in_jointree had only marked the PHV as being maybe
needed to scan the i relation, since that's what the syntactic
location of the reference implies.  Since we depend on the assumption
that ph_needed is always a subset of ph_may_need, there's an assertion
that fires if that stops being true, and that's what's crashing.

After some thought about this, I'm coming to the conclusion that lateral
references destroy the ph_maybe_needed optimization altogether: we
cannot derive an accurate estimate of where a placeholder will end up in
the final qual distribution, short of essentially doing all the work in
deconstruct_jointree over again.  I guess in principle we could repeat
deconstruct_jointree until we had stable estimates of the ph_needed
locations, but that would be expensive and probably would induce a lot
of new planner bugs (since the data structure changes performed during
deconstruct_jointree aren't designed to be backed out easily).

The only place where ph_may_need is actually used is in this bit in
make_outerjoininfo():

/*
 * Examine PlaceHolderVars.  If a PHV is supposed to be evaluated within
 * this join's nullable side, and it may get used above this join, then
 * ensure that min_righthand contains the full eval_at set of the PHV.
 * This ensures that the PHV actually can be evaluated within the RHS.
 * Note that this works only because we should already have determined the
 * final eval_at level for any PHV syntactically within this join.
 */
foreach(l, root-placeholder_list)
{
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l);
Relidsph_syn_level = phinfo-ph_var-phrels;

/* Ignore placeholder if it didn't syntactically come from RHS */
if (!bms_is_subset(ph_syn_level, right_rels))
continue;

/* We can also ignore it if it's certainly not used above this join */
/* XXX this test is probably overly conservative */
if (bms_is_subset(phinfo-ph_may_need, min_righthand))
continue;

/* Else, prevent join from being formed before we eval the PHV */
min_righthand = bms_add_members(min_righthand, phinfo-ph_eval_at);
}

Looking at it again, it's not really clear that skipping placeholders in
this way results in very much optimization --- sometimes we can avoid
constraining join order, but how often?  I tried diking out the check
on ph_may_need from this loop, and saw no changes in the regression test
results (not that that proves a whole lot about optimization of complex
queries).  So I'm pretty tempted to just remove ph_may_need, along with
the machinery that computes it.

Another possibility would be to keep the optimization, but disable it in
queries that use LATERAL.  I don't much care for that though --- seems
too Rube Goldbergish, and in any case I have a lot less faith in the
whole concept now than I had before I started digging into this issue.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL quals revisited

2013-06-26 Thread Antonin Houska

On 06/26/2013 12:52 AM, Tom Lane wrote:

Instead of setting it aside, can we (mis)use placeholder var (PHV), to
ensure that the WHERE clause is evaluated below the OJ; instead of
combining it with the ON clause?

No, that doesn't help; it has to be part of the joinquals at the join
node, or you don't get the right execution semantics.
When I wrote 'below the OJ' I meant to retain something of the original 
semantics (just like the subquery applies the WHERE clause below the OJ).

However that's probably too restrictive and your next arguments

Plus you could
lose some optimization opportunities, for example if we fail to see
that there's a strict join clause associated with the outer join
(cf lhs_strict).  Worse, I think wrapping a PHV around an otherwise
indexable clause would prevent using it for an indexscan.


also confirm the restrictiveness. So I can forget.

One more concern anyway: doesn't your proposal make subquery pull-up a 
little bit risky in terms of cost of the resulting plan?


IMO the subquery in the original query may filter out many rows and thus 
decrease the number of pairs to be evaluated by the join the ON clause 
belongs to.
If the WHERE clause moves up, then the resulting plan might be less 
efficient than the one we'd get if the subquery hadn't been pulled-up at 
all.


However at the time of cost evaluation there's no way to get back (not 
even to notice the higher cost) because the original subquery has gone 
at earlier stage of the planning.


Regards,
Antonin Houska (Tony)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL quals revisited

2013-06-26 Thread Tom Lane
Antonin Houska antonin.hou...@gmail.com writes:
 If the WHERE clause moves up, then the resulting plan might be less 
 efficient than the one we'd get if the subquery hadn't been pulled-up at 
 all.

No, because we can push the qual back down again (using a parameterized
path) if that's appropriate.  The problem at this stage is to understand
the semantics of the outer join correctly, not to make a choice of what
the plan will be.

In fact, the reason we'd not noticed this bug before is exactly that
all the test cases in the regression tests do end up pushing the qual
back down.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] LATERAL quals revisited

2013-06-25 Thread Tom Lane
I've been studying the bug reported at
http://www.postgresql.org/message-id/20130617235236.GA1636@jeremyevans.local
that the planner can do the wrong thing with queries like

SELECT * FROM
  i LEFT JOIN LATERAL (SELECT * FROM j WHERE i.n = j.n) j ON true;

I think the fundamental problem is that, because the i.n = j.n clause
appears syntactically in WHERE, the planner is treating it as if it were
an inner-join clause; but really it ought to be considered a clause of
the upper LEFT JOIN.  That is, semantically this query ought to be
equivalent to

SELECT * FROM
  i LEFT JOIN LATERAL (SELECT * FROM j) j ON i.n = j.n;

However, because distribute_qual_to_rels doesn't see the clause as being
attached to the outer join, it's not marked with the correct properties
and ends up getting evaluated in the wrong place (as a filter clause
not a join filter clause).  The bug is masked in the test cases we've
used so far because those cases are designed to let the clause get
pushed down into the scan of the inner relation --- but if it doesn't
get pushed down, it's evaluated the wrong way.

After some contemplation, I think that the most practical way to fix
this is for deconstruct_recurse and distribute_qual_to_rels to
effectively move such a qual to the place where it logically belongs;
that is, rather than processing it when we look at the lower WHERE
clause, set it aside for a moment and then add it back when looking at
the ON clause of the appropriate outer join.  This should be reasonably
easy to do by keeping a list of postponed lateral clauses while we're
scanning the join tree.

For there to *be* a unique appropriate outer join, we need to require
that a LATERAL-using qual clause that's under an outer join contain
lateral references only to the outer side of the nearest enclosing outer
join.  There's no such restriction in the spec of course, but we can
make it so by refusing to flatten a sub-select if pulling it up would
result in having a clause in the outer query that violates this rule.
There's already some code in prepjointree.c (around line 1300) that
attempts to enforce this, though now that I look at it again I'm not
sure it's covering all the bases.  We may need to extend that check.

I'm inclined to process all LATERAL-using qual clauses this way, ie
postpone them till we recurse back up to a place where they can
logically be evaluated.  That won't make any real difference when no
outer joins are present, but it will eliminate the ugliness that right
now distribute_qual_to_rels is prevented from sanity-checking the scope
of the references in a qual when LATERAL is present.  If we do it like
this, we can resurrect full enforcement of that sanity check, and then
throw an error if any postponed quals are left over when we're done
recursing.

Thoughts, better ideas?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL quals revisited

2013-06-25 Thread Antonin Houska
(Please excuse me if my proposal sounds silly, i'm still not too 
advanced in this area...)


On 06/25/2013 10:00 PM, Tom Lane wrote:

After some contemplation, I think that the most practical way to fix
this is for deconstruct_recurse and distribute_qual_to_rels to
effectively move such a qual to the place where it logically belongs;
that is, rather than processing it when we look at the lower WHERE
clause, set it aside for a moment and then add it back when looking at
the ON clause of the appropriate outer join.  This should be reasonably
easy to do by keeping a list of postponed lateral clauses while we're
scanning the join tree.



Instead of setting it aside, can we (mis)use placeholder var (PHV), to 
ensure
that the WHERE clause is evaluated below the OJ; instead of combining it 
with

the ON clause?  That would be a special PHV(s) in that it's not actually
referenced from outside the subquery.

Whether I'm right or not, I seem to have found another problem while
trying to enforce such a PHV:

postgres=# SELECT i.*, j.* FROM i LEFT JOIN LATERAL (SELECT COALESCE(i) 
FROM j WHERE (i.n = j.n)) j ON true;

The connection to the server was lost. Attempting reset: Failed.

TRAP: FailedAssertion(!(!bms_overlap(min_lefthand, min_righthand)), 
File: initsplan.c, Line: 1043)

LOG:  server process (PID 24938) was terminated by signal 6: Aborted
DETAIL:  Failed process was running: SELECT i.*, j.* FROM i LEFT JOIN 
LATERAL (SELECT COALESCE(i) FROM j WHERE (i.n = j.n)) j ON true;

LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back 
the current transaction and exit, because another server process exited 
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and 
repeat your command.

FATAL:  the database system is in recovery mode

I'm not able to judge right now whether the Assert() statement is the 
problem itself or anything

else. You'll probably know better.

(4f14c86d7434376b95477aeeb07fcc7272f4c47d is the last commit in my 
environment)


Regards,
Antonin Houska (Tony)





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL quals revisited

2013-06-25 Thread Tom Lane
Antonin Houska antonin.hou...@gmail.com writes:
 On 06/25/2013 10:00 PM, Tom Lane wrote:
 After some contemplation, I think that the most practical way to fix
 this is for deconstruct_recurse and distribute_qual_to_rels to
 effectively move such a qual to the place where it logically belongs;
 that is, rather than processing it when we look at the lower WHERE
 clause, set it aside for a moment and then add it back when looking at
 the ON clause of the appropriate outer join.

 Instead of setting it aside, can we (mis)use placeholder var (PHV), to
 ensure that the WHERE clause is evaluated below the OJ; instead of
 combining it with the ON clause?

No, that doesn't help; it has to be part of the joinquals at the join
node, or you don't get the right execution semantics.  Plus you could
lose some optimization opportunities, for example if we fail to see
that there's a strict join clause associated with the outer join
(cf lhs_strict).  Worse, I think wrapping a PHV around an otherwise
indexable clause would prevent using it for an indexscan.

 Whether I'm right or not, I seem to have found another problem while
 trying to enforce such a PHV:

 postgres=# SELECT i.*, j.* FROM i LEFT JOIN LATERAL (SELECT COALESCE(i) 
 FROM j WHERE (i.n = j.n)) j ON true;
 The connection to the server was lost. Attempting reset: Failed.

[ pokes at that ... ]  Hm, right offhand this seems like an independent
issue --- the ph_eval_at for the PHV is wrong AFAICS.  Thanks for
reporting it!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL, UNNEST and spec compliance

2013-01-25 Thread David Fetter
On Thu, Jan 24, 2013 at 09:12:41PM -0800, David Fetter wrote:
 On Thu, Jan 24, 2013 at 09:51:46AM -0800, David Fetter wrote:
  Folks,
  
  Andrew Gierth asked me to send this out as his email is in a parlous
  state at the moment.  My comments will follow in replies.  Without
  further ado:
  [snip]
  
  As I see it, the current options are:
  
  1. Do nothing, and insist on non-standard use of the LATERAL keyword.
  
  2. Add UNNEST to the grammar (or parse analysis) as a special case, making
 it implicitly LATERAL.
  
 (This would make implementing S301 easier, but special cases are ugly.)
  
  3. Make all cases of SRFs in the FROM-clause implicitly LATERAL.
  
 (As far as I can tell, those cases whose behaviour would be changed by
 this actually produce errors in versions prior to 9.3, so no working
 code should be affected.)
  
  Since LATERAL is new in 9.3, I think the pros and cons of these choices
  should be considered now, rather than being allowed to slide by unexamined.
 
 Please find attached a patch which implements approach 3.  The vast
 majority of it is changes to the regression tests.  The removed
 regression tests in join.{sql,out} are no longer errors, although some
 of them are pretty standard DoS attacks, hence they're all removed.
 
 Cheers,
 David.

Oops.  Misspelled rtekind in the previous patch.  Here's a corrected
one, much shorter.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
***
*** 9391,9397  table_ref:   relation_expr opt_alias_clause
| func_table func_alias_clause
{
RangeFunction *n = 
makeNode(RangeFunction);
!   n-lateral = false;
n-funccallnode = $1;
n-alias = linitial($2);
n-coldeflist = lsecond($2);
--- 9391,9397 
| func_table func_alias_clause
{
RangeFunction *n = 
makeNode(RangeFunction);
!   n-lateral = true;
n-funccallnode = $1;
n-alias = linitial($2);
n-coldeflist = lsecond($2);
*** a/src/backend/utils/adt/ruleutils.c
--- b/src/backend/utils/adt/ruleutils.c
***
*** 7928,7934  get_from_clause_item(Node *jtnode, Query *query, 
deparse_context *context)
deparse_columns *colinfo = deparse_columns_fetch(varno, dpns);
boolprintalias;
  
!   if (rte-lateral)
appendStringInfoString(buf, LATERAL );
  
/* Print the FROM item proper */
--- 7928,7934 
deparse_columns *colinfo = deparse_columns_fetch(varno, dpns);
boolprintalias;
  
!   if (rte-lateral  rte-rtekind != RTE_FUNCTION)
appendStringInfoString(buf, LATERAL );
  
/* Print the FROM item proper */
*** a/src/test/regress/expected/join.out
--- b/src/test/regress/expected/join.out
***
*** 3577,3603  select * from
   Output: (COALESCE((COALESCE(b.q2, 42::bigint)), d.q2))
  (26 rows)
  
- -- test some error cases where LATERAL should have been used but wasn't
- select f1,g from int4_tbl a, generate_series(0, f1) g;
- ERROR:  column f1 does not exist
- LINE 1: select f1,g from int4_tbl a, generate_series(0, f1) g;
- ^
- HINT:  There is a column named f1 in table a, but it cannot be referenced 
from this part of the query.
- select f1,g from int4_tbl a, generate_series(0, a.f1) g;
- ERROR:  invalid reference to FROM-clause entry for table a
- LINE 1: select f1,g from int4_tbl a, generate_series(0, a.f1) g;
- ^
- HINT:  There is an entry for table a, but it cannot be referenced from this 
part of the query.
- select f1,g from int4_tbl a cross join generate_series(0, f1) g;
- ERROR:  column f1 does not exist
- LINE 1: ...ct f1,g from int4_tbl a cross join generate_series(0, f1) g;
-  ^
- HINT:  There is a column named f1 in table a, but it cannot be referenced 
from this part of the query.
- select f1,g from int4_tbl a cross join generate_series(0, a.f1) g;
- ERROR:  invalid reference to FROM-clause entry for table a
- LINE 1: ... f1,g from 

Re: [HACKERS] LATERAL, UNNEST and spec compliance

2013-01-25 Thread Stephen Frost
* David Fetter (da...@fetter.org) wrote:
 As I see it, the current options are:
 
 1. Do nothing, and insist on non-standard use of the LATERAL keyword.

I'm not a big fan of this.  Providing a good error message saying you
need to use LATERAL for this query to work makes it slightly better,
but I don't feel like there's really any ambiguity here.

 2. Add UNNEST to the grammar (or parse analysis) as a special case, making
it implicitly LATERAL.
 
(This would make implementing S301 easier, but special cases are ugly.)

This I really don't like.

 3. Make all cases of SRFs in the FROM-clause implicitly LATERAL.
 
(As far as I can tell, those cases whose behaviour would be changed by
this actually produce errors in versions prior to 9.3, so no working
code should be affected.)

+1 for me on this idea.  If you're calling an SRF, passing in a lateral
value, 'LATERAL' seems like it's just a noise word, and apparently the
SQL authors felt the same, as they don't require it for unnest().

 Since LATERAL is new in 9.3, I think the pros and cons of these choices
 should be considered now, rather than being allowed to slide by unexamined.

I agree that we should really hammer this down before 9.3 is out the
door.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] LATERAL, UNNEST and spec compliance

2013-01-25 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * David Fetter (da...@fetter.org) wrote:
 3. Make all cases of SRFs in the FROM-clause implicitly LATERAL.
 
 (As far as I can tell, those cases whose behaviour would be changed by
 this actually produce errors in versions prior to 9.3, so no working
 code should be affected.)

 +1 for me on this idea.  If you're calling an SRF, passing in a lateral
 value, 'LATERAL' seems like it's just a noise word, and apparently the
 SQL authors felt the same, as they don't require it for unnest().

At first I didn't like this idea, but it's growing on me.

However ... David is wrong to claim that it's zero-risk.  It's true that
an SRF can't contain any side-references today, but it can contain an
outer reference.  Consider a case like

SELECT ... FROM a WHERE a.x IN (SELECT ... FROM b, srf(y) WHERE ...)

In existing releases the y could be a valid outer reference to a.y.
If b also has a column y, David's proposal would cause us to prefer
that interpretation, since b.y would be more closely nested than a.y.
If you're lucky, you'd get a type-mismatch error, but if the two y's
are of similar datatypes the query would just silently do something
different than it used to.

This is a little bit far-fetched, but it could happen.  As against that,
we make incompatible changes in every release, and it does seem like
assuming LATERAL for functions in FROM would be a usability gain most
of the time.  And special-casing UNNEST to satisfy the standard seems
*really* ugly.

 I agree that we should really hammer this down before 9.3 is out the
 door.

Yeah, if we're going to do this it'd make the most sense to do it in the
same release that introduces LATERAL.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL, UNNEST and spec compliance

2013-01-25 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 However ... David is wrong to claim that it's zero-risk.  It's true that
 an SRF can't contain any side-references today, but it can contain an
 outer reference.  Consider a case like
 
 SELECT ... FROM a WHERE a.x IN (SELECT ... FROM b, srf(y) WHERE ...)

I see what you mean, but on the other hand, that looks like something we
might actually want to complain about as 'y' is pretty clearly ambiguous
here.  I'm a bit surprised that doesn't already throw an error.

 This is a little bit far-fetched, but it could happen.  As against that,
 we make incompatible changes in every release, and it does seem like
 assuming LATERAL for functions in FROM would be a usability gain most
 of the time.  And special-casing UNNEST to satisfy the standard seems
 *really* ugly.

It's definitely far-fetched, imv.  If it's possible, within reason, to
explicitly throw a please disambiguate 'y' type of error in those
specific cases, that'd be nice, but I don't think it'd be required.  A
mention in the release notes would be sufficient.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] LATERAL, UNNEST and spec compliance

2013-01-25 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 SELECT ... FROM a WHERE a.x IN (SELECT ... FROM b, srf(y) WHERE ...)

Actually, this appears to fail already, at least in 9.2.2:

= select * from (values (1)) v(a) where v.a in (select x from (values (2)) 
v2(a), 
- generate_series(1,a) x);
ERROR:  function expression in FROM cannot refer to other relations of same 
query level
LINE 2: generate_series(1,a) x);
  ^

Unless it's something else that you were referring to...?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] LATERAL, UNNEST and spec compliance

2013-01-25 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 SELECT ... FROM a WHERE a.x IN (SELECT ... FROM b, srf(y) WHERE ...)

 Actually, this appears to fail already, at least in 9.2.2:

 = select * from (values (1)) v(a) where v.a in (select x from (values (2)) 
 v2(a), 
 - generate_series(1,a) x);
 ERROR:  function expression in FROM cannot refer to other relations of same 
 query level
 LINE 2: generate_series(1,a) x);
   ^

Huh ... you're right, I'd forgotten about that.  That's an ancient bug
that got fixed in passing in the LATERAL work.  So, as long as we're not
going to fix that bug in the back branches (which would be difficult
anyway IIRC), we don't have a compatibility problem ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL, UNNEST and spec compliance

2013-01-25 Thread Tom Lane
David Fetter da...@fetter.org writes:
 Please find attached a patch which implements approach 3.  The vast
 majority of it is changes to the regression tests.  The removed
 regression tests in join.{sql,out} are no longer errors, although some
 of them are pretty standard DoS attacks, hence they're all removed.

Here's a less quick-hack-y approach to that.

regards, tom lane

diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index bcee9468240e8a10b8e491a8f1ab8a1e2c5d9ede..caa9f1b3389e5ce57e2e50d13011e41c0ed3d11b 100644
*** a/doc/src/sgml/queries.sgml
--- b/doc/src/sgml/queries.sgml
*** SELECT *
*** 717,730 
  /indexterm
  
  para
!  Subqueries and table functions appearing in literalFROM/ can be
   preceded by the key word literalLATERAL/.  This allows them to
   reference columns provided by preceding literalFROM/ items.
!  (Without literalLATERAL/literal, each literalFROM/ item is
   evaluated independently and so cannot cross-reference any other
   literalFROM/ item.)
   A literalLATERAL/literal item can appear at top level in the
!  literalFROM/ list, or within a literalJOIN/ tree; in the latter
   case it can also refer to any items that are on the left-hand side of a
   literalJOIN/ that it is on the right-hand side of.
  /para
--- 717,740 
  /indexterm
  
  para
!  Subqueries appearing in literalFROM/ can be
   preceded by the key word literalLATERAL/.  This allows them to
   reference columns provided by preceding literalFROM/ items.
!  (Without literalLATERAL/literal, each subquery is
   evaluated independently and so cannot cross-reference any other
   literalFROM/ item.)
+ /para
+ 
+ para
+  Table functions appearing in literalFROM/ can also be
+  preceded by the key word literalLATERAL/, but for functions the
+  key word is optional; the function's arguments can contain references
+  to columns provided by preceding literalFROM/ items in any case.
+ /para
+ 
+ para
   A literalLATERAL/literal item can appear at top level in the
!  literalFROM/ list, or within a literalJOIN/ tree.  In the latter
   case it can also refer to any items that are on the left-hand side of a
   literalJOIN/ that it is on the right-hand side of.
  /para
*** FROM polygons p1 CROSS JOIN LATERAL vert
*** 770,776 
   polygons p2 CROSS JOIN LATERAL vertices(p2.poly) v2
  WHERE (v1 lt;-gt; v2) lt; 10 AND p1.id != p2.id;
  /programlisting
!  or in several other equivalent formulations.
  /para
  
  para
--- 780,788 
   polygons p2 CROSS JOIN LATERAL vertices(p2.poly) v2
  WHERE (v1 lt;-gt; v2) lt; 10 AND p1.id != p2.id;
  /programlisting
!  or in several other equivalent formulations.  (As already mentioned,
!  the literalLATERAL/ key word is unnecessary in this example, but
!  we use it for clarity.)
  /para
  
  para
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 26d511fad8c5b8d02bda618006ce2606036db7c7..0f9d52753d832fa458aca563fa2bfcf558120818 100644
*** a/doc/src/sgml/ref/select.sgml
--- b/doc/src/sgml/ref/select.sgml
*** TABLE [ ONLY ] replaceable class=param
*** 504,521 
   varlistentry
termliteralLATERAL/literal/term
listitem
!paraThe literalLATERAL/literal key word can precede a
! sub-commandSELECT/command or function-call literalFROM/
! item.  This allows the sub-commandSELECT/command or function
! expression to refer to columns of literalFROM/ items that appear
! before it in the literalFROM/ list.  (Without
! literalLATERAL/literal, each literalFROM/ item is evaluated
! independently and so cannot cross-reference any other
! literalFROM/ item.)  A literalLATERAL/literal item can
! appear at top level in the literalFROM/ list, or within a
! literalJOIN/ tree; in the latter case it can also refer to any
! items that are on the left-hand side of a literalJOIN/ that it is
! on the right-hand side of.
 /para
  
 para
--- 504,531 
   varlistentry
termliteralLATERAL/literal/term
listitem
!para
! The literalLATERAL/literal key word can precede a
! sub-commandSELECT/command literalFROM/ item.  This allows the
! sub-commandSELECT/command to refer to columns of literalFROM/
! items that appear before it in the literalFROM/ list.  (Without
! literalLATERAL/literal, each sub-commandSELECT/command is
! evaluated independently and so cannot cross-reference any other
! literalFROM/ item.)
!/para
! 
!para
! literalLATERAL/literal can also precede a function-call
! literalFROM/ item, but in this case it is a noise word, because
! the function expression can refer to earlier 

[HACKERS] LATERAL, UNNEST and spec compliance

2013-01-24 Thread David Fetter
Folks,

Andrew Gierth asked me to send this out as his email is in a parlous
state at the moment.  My comments will follow in replies.  Without
further ado:


SQL2008 says, for 7.6 table reference

6)
  a) If TR is contained in a from clause FC with no intervening query
expression, then the scope clause SC of TR is the select statement:
single row or innermost query specification that contains FC.  The
scope of a range variable of TR is the select list, where clause,
group by clause, having clause, and window clause of SC, together
with every lateral derived table that is simply contained in FC and
is preceded by TR, and every collection derived table that is simply
contained in FC and is preceded by TR, and the join condition of all
joined tables contained in SC that contain TR. If SC is the query
specification that is the query expression body of a simple table
query STQ, then the scope of a range variable of TR also includes the
order by clause of STQ.

This is the clause that defines the scope effect of LATERAL, and as can be
seen, it defines collection derived table, i.e. UNNEST(), as having the
same behaviour as lateral derived table.

It is also worth noting at this point that pg's FROM func() syntax is not
in the spec (the nearest is FROM TABLE(collection value expression)).

Our implementation of UNNEST currently deviates from the spec by not being
implicitly LATERAL; given the (sub)query

  SELECT * FROM sometable, UNNEST(somearray);

then somearray is required to be a parameter or outer reference rather
than a column of sometable. To get the spec's behaviour for this, we
currently have to do:

  SELECT * FROM sometable, LATERAL UNNEST(somearray);

which is non-standard syntax. (In the spec, only table subquery can
follow LATERAL.)

(We also don't accept the (optional) syntax of S301, allowing multiple
parameters to UNNEST().)

As I see it, the current options are:

1. Do nothing, and insist on non-standard use of the LATERAL keyword.

2. Add UNNEST to the grammar (or parse analysis) as a special case, making
   it implicitly LATERAL.

   (This would make implementing S301 easier, but special cases are ugly.)

3. Make all cases of SRFs in the FROM-clause implicitly LATERAL.

   (As far as I can tell, those cases whose behaviour would be changed by
   this actually produce errors in versions prior to 9.3, so no working
   code should be affected.)

Since LATERAL is new in 9.3, I think the pros and cons of these choices
should be considered now, rather than being allowed to slide by unexamined.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] LATERAL and VOLATILE functions

2012-12-15 Thread Pavel Stehule
Hello

I tested some usage of LATERAL clause, and I found so LATERAL doesn't
respects difference between VOLATILE and IMMUTABLE functions.

Is this behave expected?

-- unexpected
postgres=# select * from generate_series(1,3) g(v), LATERAL (SELECT random()) x;
;
 v │  random
───┼──
 1 │ 0.63025646051392
 2 │ 0.63025646051392
 3 │ 0.63025646051392
(3 rows)


-- expected
postgres=# select * from generate_series(1,3) g(v), LATERAL (SELECT
random() - v + v) x;
 v │ ?column?
───┼───
 1 │ 0.381548477802426
 2 │ 0.762988060247153
 3 │ 0.181648664642125
(3 rows)


Regards

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL and VOLATILE functions

2012-12-15 Thread Tom Lane
Pavel Stehule pavel.steh...@gmail.com writes:
 Is this behave expected?

 -- unexpected
 postgres=# select * from generate_series(1,3) g(v), LATERAL (SELECT random()) 
 x;
 ;
  vrandom
 ---+--
  1   0.63025646051392
  2   0.63025646051392
  3   0.63025646051392
 (3 rows)

The LATERAL keyword is a no-op since x doesn't contain any
side-reference to g(v).  So you get a plain join between g and
a single-row relation x.

If the SQL standard actually specified what LATERAL means, we could
argue about whether that's a correct interpretation or not.  I haven't
been able to find anyplace where the spec defines the semantics though.

And I'm fairly certain that we *don't* want it to mean recompute
for every row generated to the left of the keyword, whether there is
a variable reference or not.  Consider for example

select ... from a, b, c join lateral d on ...

If the D item only contains references to C, it's unlikely that the
programmer wants it to be re-evaluated again for each possible row
in A*B.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL and VOLATILE functions

2012-12-15 Thread Pavel Stehule
2012/12/15 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 Is this behave expected?

 -- unexpected
 postgres=# select * from generate_series(1,3) g(v), LATERAL (SELECT 
 random()) x;
 ;
  vrandom
 ---+--
  1   0.63025646051392
  2   0.63025646051392
  3   0.63025646051392
 (3 rows)

 The LATERAL keyword is a no-op since x doesn't contain any
 side-reference to g(v).  So you get a plain join between g and
 a single-row relation x.

 If the SQL standard actually specified what LATERAL means, we could
 argue about whether that's a correct interpretation or not.  I haven't
 been able to find anyplace where the spec defines the semantics though.

 And I'm fairly certain that we *don't* want it to mean recompute
 for every row generated to the left of the keyword, whether there is
 a variable reference or not.  Consider for example

 select ... from a, b, c join lateral d on ...

 If the D item only contains references to C, it's unlikely that the
 programmer wants it to be re-evaluated again for each possible row
 in A*B.

Stable and immutable functions should be recalculated once time, but
for volatile functions is recalculation probably more natural
(expected). Every time is strange, when function random() returns same
numbers. I am not sure if this behave can be problem in real usage -
probably it can be a surprise for someone who use random() for some
testing.

Regards

Pavel


 regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lateral function as a subquery - WIP patch

2012-03-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Mar 9, 2012 at 8:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Um ... how do you get the subquery result rows to join to only the
 correct rows of the other tables?  This looks like an unconstrained join
 to me, which is not what I believe the SQL spec for LATERAL to be, and
 it doesn't seem especially useful either.  (If a subquery could do what
 people wanted, we'd not be hearing all the requests for LATERAL.)

 I think LATERAL is intended as more or less an unconstrained nested
 loop with the lateral expression on the inner side, parameterized by
 value from the outer side.  Typically it's a SRF.

Um ... if it's parameterized by values from a current row of the outer
side, then it's not an unconstrained join.  That would be like doing an
inner indexscan join and producing a cross-join result.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lateral function as a subquery - WIP patch

2012-03-10 Thread Robert Haas
On Sat, Mar 10, 2012 at 4:16 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Mar 9, 2012 at 8:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Um ... how do you get the subquery result rows to join to only the
 correct rows of the other tables?  This looks like an unconstrained join
 to me, which is not what I believe the SQL spec for LATERAL to be, and
 it doesn't seem especially useful either.  (If a subquery could do what
 people wanted, we'd not be hearing all the requests for LATERAL.)

 I think LATERAL is intended as more or less an unconstrained nested
 loop with the lateral expression on the inner side, parameterized by
 value from the outer side.  Typically it's a SRF.

 Um ... if it's parameterized by values from a current row of the outer
 side, then it's not an unconstrained join.  That would be like doing an
 inner indexscan join and producing a cross-join result.

True.  I just meant that no join filter was implied.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lateral function as a subquery - WIP patch

2012-03-10 Thread Antonin Houska

On 03/10/2012 02:15 AM, Tom Lane wrote:

Um ... how do you get the subquery result rows to join to only the
correct rows of the other tables?
The subquery just restricts the set of rows that the function has to 
evaluate. The main query is supposed to perform the join.
I understand, such a join causes repeated scan of the function if the 
function is on the inner side.

This looks like an unconstrained join
to me, which is not what I believe the SQL spec for LATERAL to be,

o.k., then just forget about my proposal. Thanks for your comments anyway,

Tony H.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] lateral function as a subquery - WIP patch

2012-03-09 Thread Antonin Houska

Hello,
in the related discussions mentioned on TODO list

http://archives.postgresql.org/pgsql-hackers/2009-09/msg00292.php
http://archives.postgresql.org/pgsql-hackers/2009-10/msg00991.php
(The 1st is rather on SQL, I didn't focuss on it yet.)

the implementation is discussed from optimizer/executor's point of view.

I'm wondering why not to address the problem at earlier stage: rewrite 
the range function to a subquery.


For example:

SELECT *
FROM  a, b, func(a.i, b.j) as c, d
WHERE a.i=b.j and b.j = d.k and c1

may become

SELECT *
FROM  a, b, subquery as c, d
WHERE a.i=b.j and b.j = d.k and c1

where subquery is

SELECT func(a.i, b.j)
FROM a,b
WHERE a.i=b.j

The WHERE clause of the original query is considered a list of ANDed 
subclauses.
Given 'rt_index' is range table index of the function, only those 
subclauses are used in the substitution subquery having RT index

lower than 'rt_index'.
Even with such a partial qualification the subquery can safely exclude 
(from function calls) rows that the main query won't need anyway.


Note that

1. This is rather an alternative to the optimizer/executor focused 
approach that the past discussions covered. I'm aware of questions about 
SQL conformance.

2. I only propose this for functions, not for general queries.
3. This draft does not deal with record-returning functions (Although I 
might have some idea how to treat them.).


Is there any obvious reason not to go this way?

Attached is my (experimental) implementation.

Kind regards,
Tony.
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 7f4da92..ac4f620 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -563,25 +563,6 @@ transformRangeFunction(ParseState *pstate, RangeFunction *r)
 	assign_expr_collations(pstate, funcexpr);
 
 	/*
-	 * The function parameters cannot make use of any variables from other
-	 * FROM items.	(Compare to transformRangeSubselect(); the coding is
-	 * different though because we didn't parse as a sub-select with its own
-	 * level of namespace.)
-	 *
-	 * XXX this will need further work to support SQL99's LATERAL() feature,
-	 * wherein such references would indeed be legal.
-	 */
-	if (pstate-p_relnamespace || pstate-p_varnamespace)
-	{
-		if (contain_vars_of_level(funcexpr, 0))
-			ereport(ERROR,
-	(errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
-	 errmsg(function expression in FROM cannot refer to other relations of same query level),
-	 parser_errposition(pstate,
-		locate_var_of_level(funcexpr, 0;
-	}
-
-	/*
 	 * Disallow aggregate functions in the expression.	(No reason to postpone
 	 * this check until parseCheckAggregates.)
 	 */
diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c
index 04f9622..b7c93c6 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -18,6 +18,8 @@
 #include commands/trigger.h
 #include nodes/makefuncs.h
 #include nodes/nodeFuncs.h
+#include optimizer/clauses.h
+#include optimizer/var.h
 #include parser/analyze.h
 #include parser/parse_coerce.h
 #include parser/parsetree.h
@@ -60,6 +62,7 @@ static List *matchLocks(CmdType event, RuleLock *rulelocks,
 		   int varno, Query *parsetree);
 static Query *fireRIRrules(Query *parsetree, List *activeRIRs,
 			 bool forUpdatePushedDown);
+static void substituteRTEFunction(Query *parsetree, RangeTblEntry *rte, int rt_index);
 
 
 /*
@@ -1557,6 +1560,14 @@ fireRIRrules(Query *parsetree, List *activeRIRs, bool forUpdatePushedDown)
 			continue;
 		}
 
+		if (rte-rtekind == RTE_FUNCTION)
+		{
+
+			substituteRTEFunction(parsetree, rte, rt_index);
+			continue;
+
+		}
+
 		/*
 		 * Joins and other non-relation RTEs can be ignored completely.
 		 */
@@ -2239,3 +2250,192 @@ QueryRewrite(Query *parsetree)
 
 	return results;
 }
+
+static void
+substituteRTEFunction(Query *parsetree, RangeTblEntry *rte, int rt_index) {
+	bool is_lateral = false;
+	FuncExpr *funcExpr;
+	ListCell *lcell;
+
+	if (!IsA(rte-funcexpr, FuncExpr)) { 
+		elog(ERROR, unrecognized range function type: %d, (int) nodeTag(rte-funcexpr));
+	}
+
+	funcExpr = (FuncExpr *) rte-funcexpr;
+
+	if (!funcExpr-args) {
+		return;
+	}
+
+	foreach(lcell, funcExpr-args) {
+		Node *arg = (Node *) lfirst(lcell);
+
+		is_lateral = contain_vars_of_level(arg, 0);
+		if (is_lateral) {
+			break;
+		}
+	}
+
+
+	if (is_lateral) {
+		List* from;
+		List *rt_sub = NIL;
+		Relids rt_sub_ids = NULL;
+		Query *subquery = makeNode(Query);
+		List *subquery_from = NIL;
+		TargetEntry *subquery_te = makeTargetEntry((Expr *) funcExpr, 1, NULL, false);
+		Node *subquery_quals = NULL;
+		int i;
+
+
+		Assert(parsetree-jointree != NULL  IsA(parsetree-jointree, FromExpr)); 
+
+
+		from = parsetree-jointree-fromlist;
+
+		/*
+		 * Construct FROM list of the substitution subquery.
+		 */
+		foreach (lcell, from) {
+			Node *fnode = lfirst(lcell);
+			int rt_ind_f = 0;
+
+			if (IsA(fnode, 

Re: [HACKERS] lateral function as a subquery - WIP patch

2012-03-09 Thread Tom Lane
Antonin Houska antonin.hou...@gmail.com writes:
 For example:

 SELECT *
 FROM  a, b, func(a.i, b.j) as c, d
 WHERE a.i=b.j and b.j = d.k and c1

 may become

 SELECT *
 FROM  a, b, subquery as c, d
 WHERE a.i=b.j and b.j = d.k and c1

 where subquery is

 SELECT func(a.i, b.j)
 FROM a,b
 WHERE a.i=b.j

Um ... how do you get the subquery result rows to join to only the
correct rows of the other tables?  This looks like an unconstrained join
to me, which is not what I believe the SQL spec for LATERAL to be, and
it doesn't seem especially useful either.  (If a subquery could do what
people wanted, we'd not be hearing all the requests for LATERAL.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lateral function as a subquery - WIP patch

2012-03-09 Thread Robert Haas
On Fri, Mar 9, 2012 at 8:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Antonin Houska antonin.hou...@gmail.com writes:
 For example:

 SELECT *
 FROM  a, b, func(a.i, b.j) as c, d
 WHERE a.i=b.j and b.j = d.k and c1

 may become

 SELECT *
 FROM  a, b, subquery as c, d
 WHERE a.i=b.j and b.j = d.k and c1

 where subquery is

 SELECT func(a.i, b.j)
 FROM a,b
 WHERE a.i=b.j

 Um ... how do you get the subquery result rows to join to only the
 correct rows of the other tables?  This looks like an unconstrained join
 to me, which is not what I believe the SQL spec for LATERAL to be, and
 it doesn't seem especially useful either.  (If a subquery could do what
 people wanted, we'd not be hearing all the requests for LATERAL.)

I think LATERAL is intended as more or less an unconstrained nested
loop with the lateral expression on the inner side, parameterized by
value from the outer side.  Typically it's a SRF.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2010-02-18 Thread Bruce Momjian
Robert Haas wrote:
 On Sat, Dec 19, 2009 at 11:01 PM, Robert Haas robertmh...@gmail.com wrote:
  On Sat, Dec 19, 2009 at 4:46 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  On Sat, Dec 19, 2009 at 3:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  I believe the correct approach is probably to treat values that need to
  be propagated into the inner side as executor parameters. ?This could
  replace the existing, rather crocky, management of values passed into a
  nestloop inner indexscan.
 
  What is the best place to look for the existing, rather crocky code?
 
  Follow the second argument of ExecReScan from nodeNestloop to
  nodeIndexscan.
 
  Yeah, this is grotty. ?It appears that the comment introducing
  ExecReScan() is somewhat incorrect. ?It asserts that exprCtxt is used
  only
 
 Sigh.
 
 ...is used only for index scans.  However, it's actually also used for
 bitmap scans (both heap and index) and TID scans.  Also, there appears
 to be an effort by nodes that don't use exprCtxt directly to propagate
 down through the node tree, which doesn't seem to make much sense if
 this is only intended to be used on the inner side of a nestloop.

Does some comment need to be updated?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-12-19 Thread Hitoshi Harada
2009/10/20 Andrew Gierth and...@tao11.riddles.org.uk:
 Right now, the only way pg can plan this is to do a hashjoin or
 mergejoin of the _entire content of big1 and big2_ and join the
 result against small (again in a hashjoin or mergejoin plan).
 This becomes excessively slow compared to the ideal plan:

  nested loop
      seqscan on small
      nested loop
         indexscan on big1 where id=small.id
         indexscan on big2 where id=small.id (or big1.id which is equiv)

 (The same argument applies if small is not actually small but has
 restriction clauses)

I have a similar issue on my mind, but is this the same as the topic?

SELECT ... FROM small INNER JOIN (SELECT ... FROM large GROUP BY
large.id) agged ON small.id = agged.id WHERE small.id IN (bla bla bla)

The ideal plan is SeqScan on small with filtering sub query aggregate
on large by small.id but the actual plan is full aggregate on large
since the planner doesn't push down outer qual to aggregate node. The
output will discard almost all of agged's output.


Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-12-19 Thread Robert Haas
On Sat, Dec 19, 2009 at 12:49 PM, Hitoshi Harada umi.tan...@gmail.com wrote:
 2009/10/20 Andrew Gierth and...@tao11.riddles.org.uk:
 Right now, the only way pg can plan this is to do a hashjoin or
 mergejoin of the _entire content of big1 and big2_ and join the
 result against small (again in a hashjoin or mergejoin plan).
 This becomes excessively slow compared to the ideal plan:

  nested loop
      seqscan on small
      nested loop
         indexscan on big1 where id=small.id
         indexscan on big2 where id=small.id (or big1.id which is equiv)

 (The same argument applies if small is not actually small but has
 restriction clauses)

 I have a similar issue on my mind, but is this the same as the topic?

 SELECT ... FROM small INNER JOIN (SELECT ... FROM large GROUP BY
 large.id) agged ON small.id = agged.id WHERE small.id IN (bla bla bla)

 The ideal plan is SeqScan on small with filtering sub query aggregate
 on large by small.id but the actual plan is full aggregate on large
 since the planner doesn't push down outer qual to aggregate node. The
 output will discard almost all of agged's output.

I just tried this and it works for me.

create table foo (id serial, name varchar, primary key (id));
create table bar (id serial, foo_id integer references foo (id), name
varchar, primary key (id));
insert into foo (name) select random()::varchar from generate_series(1,1000);
insert into bar (foo_id, name) select (g%10)+1, random()::varchar from
generate_series(1,1) g;
explain select * from foo inner join (select foo_id, sum(1) from bar
group by 1) x on foo.id = x.foo_id where x.foo_id = 1;

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-12-19 Thread Hitoshi Harada
2009/12/20 Robert Haas robertmh...@gmail.com:
 On Sat, Dec 19, 2009 at 12:49 PM, Hitoshi Harada umi.tan...@gmail.com wrote:
 2009/10/20 Andrew Gierth and...@tao11.riddles.org.uk:
 Right now, the only way pg can plan this is to do a hashjoin or
 mergejoin of the _entire content of big1 and big2_ and join the
 result against small (again in a hashjoin or mergejoin plan).
 This becomes excessively slow compared to the ideal plan:

  nested loop
      seqscan on small
      nested loop
         indexscan on big1 where id=small.id
         indexscan on big2 where id=small.id (or big1.id which is equiv)

 (The same argument applies if small is not actually small but has
 restriction clauses)

 I have a similar issue on my mind, but is this the same as the topic?

 SELECT ... FROM small INNER JOIN (SELECT ... FROM large GROUP BY
 large.id) agged ON small.id = agged.id WHERE small.id IN (bla bla bla)

 The ideal plan is SeqScan on small with filtering sub query aggregate
 on large by small.id but the actual plan is full aggregate on large
 since the planner doesn't push down outer qual to aggregate node. The
 output will discard almost all of agged's output.

 I just tried this and it works for me.

 create table foo (id serial, name varchar, primary key (id));
 create table bar (id serial, foo_id integer references foo (id), name
 varchar, primary key (id));
 insert into foo (name) select random()::varchar from generate_series(1,1000);
 insert into bar (foo_id, name) select (g%10)+1, random()::varchar from
 generate_series(1,1) g;
 explain select * from foo inner join (select foo_id, sum(1) from bar
 group by 1) x on foo.id = x.foo_id where x.foo_id = 1;

 ...Robert


Ah your example works for me, too. My issue is:

explain select * from foo inner join (select foo_id, sum(1) from bar
group by 1) x on foo.id = x.foo_id where foo.id = 1;

where foo.id = 1 (not where x.foo_id = 1).
And I now figured out it's another problem.

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-12-19 Thread Robert Haas
On Thu, Dec 17, 2009 at 10:13 PM, Robert Haas robertmh...@gmail.com wrote:
 Another question I have - while generalizing the inner-indexscan
 machinery is an interesting join optimization technique, I'm thinking
 that it actually has very little to do with LATERAL.  Is there any
 reason to suppose that one or the other needs to be done first?

And the winner is... yes.  Or at least, I think so.  One of the major
reasons why people want LATERAL() is for SRFs, but currently, even if
you beat the code into allowing a SRF with an outer reference, the
planner can easily be persuaded to run the SRF on the outer side of a
join with the dependency as the inner side, which ain't gonna work.
(Even you jigger the query so that the planner gets them on the
correct sides of the join, the executor fails, but that's a different
problem.)

The idea Tom came up with back in October is to allow paths to be
tagged with a set of rels to which they must in the future be joined
in order for the path to be allowable.  The point of that exercise was
to generalize the current inner-indexscan machinery so that we can
create that type of plan in match_unsorted_outer() even when the inner
side is a joinrel.  But, it strikes me that what we need to allow a
function scan with an outer reference is remarkably similar - the
function scan can only be used as the inner side of a nestloop with a
certain set of rels on the outer side.

On the other hand, it's not exactly the same, either.  In the case of
a construct like A LJ (B IJ C), partial-index scan paths for B and C
will require a subsequent nest-join to A to become fully valid, but
there will also be other paths that don't.  But for something like A,
LATERAL (some_srf(A.x)), the ONLY path for the rel defined by
some_srf(A.x) has a future-join requirement of {A}.  It's not clear to
me whether there's anything useful that can be done with this
knowledge.

Incidentally, the reason why the executor chokes trying to execute a
SRF with an outer reference is because ExecEvalVar() craps out trying
to dereference a null TupleTableSlot.  If I'm understanding this
correctly, that, in turn, happens because the variable that we're
trying to deference is marked as neither INNER nor OUTER, so it's
assumed to be from a scan, but there's no scan node.  Going even
further from my area of actually understanding what's going on, I
think this needs to be fixed by adjusting setrefs.c.  Allowing
LATERAL(), or for that matter the generalized inner-index scan stuff,
will I think mean that set_inner_join_references() will need to handle
a lot more cases than it current does.  I don't understand this code
well enough to begin to speculate as to what should happen here.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-12-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Incidentally, the reason why the executor chokes trying to execute a
 SRF with an outer reference is because ExecEvalVar() craps out trying
 to dereference a null TupleTableSlot.  If I'm understanding this
 correctly, that, in turn, happens because the variable that we're
 trying to deference is marked as neither INNER nor OUTER, so it's
 assumed to be from a scan, but there's no scan node.  Going even
 further from my area of actually understanding what's going on, I
 think this needs to be fixed by adjusting setrefs.c.

Well, no: we can't handle such references as OUTER vars because the
OUTER slot is likely to be in use already in the sub-join.  It would be
even messier if you wanted several references to different outer
relations.

I believe the correct approach is probably to treat values that need to
be propagated into the inner side as executor parameters.  This could
replace the existing, rather crocky, management of values passed into a
nestloop inner indexscan.  The mechanisms that deal with forcing rescans
of subplans affected by a changed parameter value would be very helpful
here too.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-12-19 Thread Robert Haas
On Sat, Dec 19, 2009 at 3:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Incidentally, the reason why the executor chokes trying to execute a
 SRF with an outer reference is because ExecEvalVar() craps out trying
 to dereference a null TupleTableSlot.  If I'm understanding this
 correctly, that, in turn, happens because the variable that we're
 trying to deference is marked as neither INNER nor OUTER, so it's
 assumed to be from a scan, but there's no scan node.  Going even
 further from my area of actually understanding what's going on, I
 think this needs to be fixed by adjusting setrefs.c.

 Well, no: we can't handle such references as OUTER vars because the
 OUTER slot is likely to be in use already in the sub-join.  It would be
 even messier if you wanted several references to different outer
 relations.

Oh.  Yeah.

 I believe the correct approach is probably to treat values that need to
 be propagated into the inner side as executor parameters.  This could
 replace the existing, rather crocky, management of values passed into a
 nestloop inner indexscan.  The mechanisms that deal with forcing rescans
 of subplans affected by a changed parameter value would be very helpful
 here too.

What is the best place to look for the existing, rather crocky code?
I have to admit that the whole mechanism by which paths get
transformed into plans and handed off to the executor is still rather
opaque to me.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-12-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, Dec 19, 2009 at 3:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I believe the correct approach is probably to treat values that need to
 be propagated into the inner side as executor parameters.  This could
 replace the existing, rather crocky, management of values passed into a
 nestloop inner indexscan.

 What is the best place to look for the existing, rather crocky code?

Follow the second argument of ExecReScan from nodeNestloop to
nodeIndexscan.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-12-19 Thread Robert Haas
On Sat, Dec 19, 2009 at 4:46 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sat, Dec 19, 2009 at 3:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I believe the correct approach is probably to treat values that need to
 be propagated into the inner side as executor parameters.  This could
 replace the existing, rather crocky, management of values passed into a
 nestloop inner indexscan.

 What is the best place to look for the existing, rather crocky code?

 Follow the second argument of ExecReScan from nodeNestloop to
 nodeIndexscan.

Yeah, this is grotty.  It appears that the comment introducing
ExecReScan() is somewhat incorrect.  It asserts that exprCtxt is used
only

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-12-19 Thread Robert Haas
On Sat, Dec 19, 2009 at 11:01 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Dec 19, 2009 at 4:46 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sat, Dec 19, 2009 at 3:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I believe the correct approach is probably to treat values that need to
 be propagated into the inner side as executor parameters.  This could
 replace the existing, rather crocky, management of values passed into a
 nestloop inner indexscan.

 What is the best place to look for the existing, rather crocky code?

 Follow the second argument of ExecReScan from nodeNestloop to
 nodeIndexscan.

 Yeah, this is grotty.  It appears that the comment introducing
 ExecReScan() is somewhat incorrect.  It asserts that exprCtxt is used
 only

Sigh.

...is used only for index scans.  However, it's actually also used for
bitmap scans (both heap and index) and TID scans.  Also, there appears
to be an effort by nodes that don't use exprCtxt directly to propagate
down through the node tree, which doesn't seem to make much sense if
this is only intended to be used on the inner side of a nestloop.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-12-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Yeah, this is grotty.  It appears that the comment introducing
 ExecReScan() is somewhat incorrect.  It asserts that exprCtxt is used
 only

 Sigh.

 ...is used only for index scans.  However, it's actually also used for
 bitmap scans (both heap and index) and TID scans.

Yeah, the comment was probably correct when written.

 Also, there appears
 to be an effort by nodes that don't use exprCtxt directly to propagate
 down through the node tree, which doesn't seem to make much sense if
 this is only intended to be used on the inner side of a nestloop.

That's just dead code, which is a good thing because the coverage is
pretty incomplete.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-12-17 Thread Robert Haas
On Sun, Oct 18, 2009 at 2:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 You could probably convince me that a merge join is not going to be
 too useful (how often can you want a merge join on the inner side of a
 nested loop?

 Why not?  As Andrew pointed out, what we're really trying to accomplish
 here is consider sub-join plans that are parameterized by a value
 obtained from an outer relation.  I think we shouldn't artificially
 limit what we consider.

 But anyway I think we're on the same page here: what we ought to do is
 try implementing this scheme without any extra restrictions on what it
 considers, and see what the performance is like.  We can try to limit
 what it considers if it turns out not to work well in the simplest
 form.

You mention what we ought to do here, so - is this something that
you're planning to have a go at?

Another question I have - while generalizing the inner-indexscan
machinery is an interesting join optimization technique, I'm thinking
that it actually has very little to do with LATERAL.  Is there any
reason to suppose that one or the other needs to be done first?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-10-19 Thread Greg Stark
On Sun, Oct 18, 2009 at 12:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 You could probably convince me that a merge join is not going to be
 too useful (how often can you want a merge join on the inner side of a
 nested loop?

 Why not?  As Andrew pointed out, what we're really trying to accomplish
 here is consider sub-join plans that are parameterized by a value
 obtained from an outer relation.  I think we shouldn't artificially
 limit what we consider.

Am I understanding you right that a typical case of this might be something like

nested loop
index scan expecting 1 record
merge join
index scan on partial index where col = outer.foo and col2
between a and b
some other scan

or

nested loop
index scan expecting 1 record
merge join
index scan on col1,col2 where col1 = outer.foo and col2
between a and b
some other scan

Ie, where the nested loop is a degenerate nested loop which only
expects a single value and provides a parameter which allows some
partial index to work or allows for some other index scan by providing
a higher order key element?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-10-19 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 nested loop
 index scan expecting 1 record
 merge join
 index scan on col1,col2 where col1 = outer.foo and col2
 between a and b
 some other scan

 Ie, where the nested loop is a degenerate nested loop which only
 expects a single value and provides a parameter which allows some
 partial index to work or allows for some other index scan by providing
 a higher order key element?

Right.  I don't see any particular reason to assume the inner path
is iterated only once, either.  If the key value coming from the outer
path is sufficiently useful, this could be a win even with multiple
iterations, as compared to having to scan the whole of some large
relation or other ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-10-19 Thread Andrew Gierth
 Greg == Greg Stark gsst...@mit.edu writes:

  Why not?  As Andrew pointed out, what we're really trying to
  accomplish here is consider sub-join plans that are parameterized
  by a value obtained from an outer relation.  I think we shouldn't
  artificially limit what we consider.

 Greg Am I understanding you right that a typical case of this might
 Greg be something like

 Greg nested loop
 Greg index scan expecting 1 record
 Greg merge join
 Greg index scan on partial index where col = outer.foo and col2
 Greg between a and b
 Greg some other scan

no, because you could never pick the partial index at plan time.

 Greg or

 Greg nested loop
 Greg index scan expecting 1 record
 Greg merge join
 Greg index scan on col1,col2 where col1 = outer.foo and col2
 Greg between a and b
 Greg some other scan

 Greg Ie, where the nested loop is a degenerate nested loop which
 Greg only expects a single value and provides a parameter which
 Greg allows some partial index to work or allows for some other
 Greg index scan by providing a higher order key element?

The nested loop does NOT have to be degenerate. Consider queries of
this form:

select * from small
  left join (big1 join big2 on (big1.id=big2.id))
  on (small.id=big1.id);

Right now, the only way pg can plan this is to do a hashjoin or
mergejoin of the _entire content of big1 and big2_ and join the
result against small (again in a hashjoin or mergejoin plan).
This becomes excessively slow compared to the ideal plan:

  nested loop
  seqscan on small
  nested loop
 indexscan on big1 where id=small.id
 indexscan on big2 where id=small.id (or big1.id which is equiv)

(The same argument applies if small is not actually small but has
restriction clauses)

-- 
Andrew (irc:RhodiumToad)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-10-18 Thread Robert Haas
On Sat, Oct 17, 2009 at 10:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 That still leaves a lot of silly paths, though.  In many cases, if
 you're thinking about joining A to {B C} using an index-accelerated
 path, you'd be just as well off joining to B first and then to C.  So
 it might be that we only need to consider index-accelerated paths when
 there is no legal join between the LHS and a subset of the RHS.

 Yeah.  If there are no join order constraints, it's always possible to
 form a plan such that you join a given rel only once you have all the
 rels needed (to provide values for the inner indexscan) on the lefthand
 side.  I think this is probably the main reason why the issue was not
 treated in the planner originally.  So maybe the way to think about
 this is as a way of dealing with join order constraints without losing
 the benefits of inner indexscans.

 The other problem I see here is that the bottom-up approach that we
 use in general is going to be difficult to apply here, because the set
 of paths will vary depending on what parameters are pushed down from
 the outer side.

 Well, we deal with that already --- the set of possible inner indexscan
 paths already varies depending on what the LHS is.  I think the point
 here is that we'd be considering an inner path that's against an LHS
 that it's not legal to join the inner rel to *directly*.  Such a path
 would only be legal if we later join to that LHS at a higher join level.
 So we'd be keeping around some tentative paths that might not ever form
 a valid join plan.

 Maybe we should turn around the way that inner indexscan paths are
 made.  Currently we form them on-the-fly while considering a valid
 join combination.  Maybe we should build them all at the first level
 (driving this off the set of available join clauses for each base rel)
 and mark each such path as requires a join to this other set of rels
 to be valid.  But then we'd go ahead and join such paths to *other*
 rels, keeping the resulting join paths still marked as requiring the
 same future join.  Once that join actually happens, the resulting path
 becomes fully valid.  Only a join to a proper subset of the future-join
 requirement would be disallowed meanwhile.

 I'm not even sure this would be slower or more complicated than what we
 do now --- if you look at the logic that caches potential inner
 indexscan plans, it's almost doing this already.  It would result in
 considering more join paths, but only ones that have some plausible use.

Wow, that's pretty sneaky.  I had to read this email twice before I
understood what you were proposing.  It sounds to me like it will
work.  I'm not 100% sure what the impact on planner performance will
be, but it's at least plausible that it will be OK.

I think you should only ever join an incomplete inner-indexscan path
to (1) another inner-indexscan path or (2) the cheapest total path for
*exactly* the future-join requirement.  Anything else doesn't seem
productive.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-10-18 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I think you should only ever join an incomplete inner-indexscan path
 to (1) another inner-indexscan path or (2) the cheapest total path for
 *exactly* the future-join requirement.  Anything else doesn't seem
 productive.

I don't see any reason to constrain the form of joins before the
future-join requirement.  Once you get to the join that satisfies
that requirement, obviously you must implement it as a nestloop
with the inner-indexscan path on the inside.  But there shouldn't
be any more constraint beyond that one for that join, either:
* you might want a fast-start plan not cheapest total
* sort orderings of the outer path might be useful

We know that sort ordering of the inner indexscan or its joins
won't be useful once we reach the future-join level, so it might
be possible to discard paths that are not cheapest total cost
below that level.  The problem is to distinguish sort orderings
that are useful within joins below that level.  You could avoid that
if you could convince yourself that there's no point in ever doing
a mergejoin at such a level, but I don't think I believe that.

It may well be that there's no point in being too tense about this issue
anyway.  The planner will only consider early joins to rels that have
join clauses to the rel with the inner-indexscan path.  There probably
won't be that many of them.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-10-18 Thread Robert Haas
On Sun, Oct 18, 2009 at 11:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I think you should only ever join an incomplete inner-indexscan path
 to (1) another inner-indexscan path or (2) the cheapest total path for
 *exactly* the future-join requirement.  Anything else doesn't seem
 productive.

 I don't see any reason to constrain the form of joins before the
 future-join requirement.  Once you get to the join that satisfies
 that requirement, obviously you must implement it as a nestloop
 with the inner-indexscan path on the inside.  But there shouldn't
 be any more constraint beyond that one for that join, either:
        * you might want a fast-start plan not cheapest total
        * sort orderings of the outer path might be useful

 We know that sort ordering of the inner indexscan or its joins
 won't be useful once we reach the future-join level, so it might
 be possible to discard paths that are not cheapest total cost
 below that level.

Yeah, this was what I was driving at, but I got turned around in my
head and was explaining it incorrectly.

 The problem is to distinguish sort orderings
 that are useful within joins below that level.  You could avoid that
 if you could convince yourself that there's no point in ever doing
 a mergejoin at such a level, but I don't think I believe that.

 It may well be that there's no point in being too tense about this issue
 anyway.  The planner will only consider early joins to rels that have
 join clauses to the rel with the inner-indexscan path.  There probably
 won't be that many of them.

You could probably convince me that a merge join is not going to be
too useful (how often can you want a merge join on the inner side of a
nested loop?  ... especially when there are partial index scans
involved) but I think your last point here is well-taken.  We've
certainly turned a corner from it's just the exponential growth in
the number of join paths to consider that's a problem.  :-)

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-10-18 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 You could probably convince me that a merge join is not going to be
 too useful (how often can you want a merge join on the inner side of a
 nested loop?

Why not?  As Andrew pointed out, what we're really trying to accomplish
here is consider sub-join plans that are parameterized by a value
obtained from an outer relation.  I think we shouldn't artificially
limit what we consider.

But anyway I think we're on the same page here: what we ought to do is
try implementing this scheme without any extra restrictions on what it
considers, and see what the performance is like.  We can try to limit
what it considers if it turns out not to work well in the simplest
form.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-10-18 Thread Robert Haas
On Sun, Oct 18, 2009 at 3:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 You could probably convince me that a merge join is not going to be
 too useful (how often can you want a merge join on the inner side of a
 nested loop?

 Why not?  As Andrew pointed out, what we're really trying to accomplish
 here is consider sub-join plans that are parameterized by a value
 obtained from an outer relation.  I think we shouldn't artificially
 limit what we consider.

 But anyway I think we're on the same page here: what we ought to do is
 try implementing this scheme without any extra restrictions on what it
 considers, and see what the performance is like.  We can try to limit
 what it considers if it turns out not to work well in the simplest
 form.

And then when we get done we can consider implementing $SUBJECT, which
is no longer what this thread is about at all.  :-)

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-10-17 Thread Robert Haas
On Tue, Sep 22, 2009 at 11:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Currently, however, we only consider this possibility when the inner
 rel is NOT a joinrel.  It seems like it might be possible to change
 this, but it doesn't look straightforward.

 Well, it's straightforward enough in theory, it's just the exponential
 growth in the number of join paths to consider that's a problem :-(.
 I think what we'd need is a heuristic to limit the paths considered.

[ picking this up again now that CommitFest is over ]

So that I don't have to keep referring to this possibility and
similar circumlocution, let's define an index-accelerated path (feel
free to suggest a different term) to be the generalization to joinrels
of what we now call a nested inner index-scan.  In other words, they
can only be usefully used on the inner side of a nested loop, where
they'll be rescanned with different parameters for each outer row, and
they can only ever win when some part of the path involves a *partial*
index scan that can used one of the passed parameters as an index
qual.

For a fixed a set of parameters to be pushed down from the outer side,
it seems to me that we could build up a set of index-accelerated paths
for a joinrel {A B} by considering the combination of (1) an
index-accelerated path for A joined to an index-accelerated path for
B, or (2) an index-accelerated path for A joined to the cheapest total
path for B.  I believe we don't need to worry about path keys because
this will only ever be used on the inner side of a nested loop, so the
pathkeys will be ignored anyway.  In other words, the first heuristic
is that you can't build up an index-accelerated path for the joinrel
unless there is an index-accelerated path for a least one of its
baserels.

That still leaves a lot of silly paths, though.  In many cases, if
you're thinking about joining A to {B C} using an index-accelerated
path, you'd be just as well off joining to B first and then to C.  So
it might be that we only need to consider index-accelerated paths when
there is no legal join between the LHS and a subset of the RHS.  But
I'm not completely sure that I'm right about this, nor whether it's
easy to check for.

The other problem I see here is that the bottom-up approach that we
use in general is going to be difficult to apply here, because the set
of paths will vary depending on what parameters are pushed down from
the outer side.

 I think Andrew was suggesting that we not attempt to consider this
 automatically, but only when the user writes the query in a way that
 exposes it directly via LATERAL.  While that goes against my normal
 instincts for the planner, it isn't unreasonable as a first step.

Possibly, but I'm not sure we have a good enough design sketch at this
point to even know whether such a restriction would be useful.

Another thought, only semi-related.  One of the big use cases for
LATERAL in general is to use a set-returning function in the FROM
clause that uses vars from a preceding FROM item.  I am idly wondering
if there's a reason why ExecProject is not its own node type.  It
seems that we have close to the same logic scattered through a whole
bunch of different node types...

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-10-17 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Another thought, only semi-related.  One of the big use cases for
 LATERAL in general is to use a set-returning function in the FROM
 clause that uses vars from a preceding FROM item.  I am idly wondering
 if there's a reason why ExecProject is not its own node type.

You generally need it everywhere.  Scan nodes need it because you don't
want unused columns propagating upwards, and join nodes need it because
the two input tuples have to be unified somehow.  We do skip projection
ability in a few node types, but I doubt it would be profitable to
remove it from the rest.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-10-17 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 That still leaves a lot of silly paths, though.  In many cases, if
 you're thinking about joining A to {B C} using an index-accelerated
 path, you'd be just as well off joining to B first and then to C.  So
 it might be that we only need to consider index-accelerated paths when
 there is no legal join between the LHS and a subset of the RHS.

Yeah.  If there are no join order constraints, it's always possible to
form a plan such that you join a given rel only once you have all the
rels needed (to provide values for the inner indexscan) on the lefthand
side.  I think this is probably the main reason why the issue was not
treated in the planner originally.  So maybe the way to think about
this is as a way of dealing with join order constraints without losing
the benefits of inner indexscans.

 The other problem I see here is that the bottom-up approach that we
 use in general is going to be difficult to apply here, because the set
 of paths will vary depending on what parameters are pushed down from
 the outer side.

Well, we deal with that already --- the set of possible inner indexscan
paths already varies depending on what the LHS is.  I think the point
here is that we'd be considering an inner path that's against an LHS
that it's not legal to join the inner rel to *directly*.  Such a path
would only be legal if we later join to that LHS at a higher join level.
So we'd be keeping around some tentative paths that might not ever form
a valid join plan.

Maybe we should turn around the way that inner indexscan paths are
made.  Currently we form them on-the-fly while considering a valid
join combination.  Maybe we should build them all at the first level
(driving this off the set of available join clauses for each base rel)
and mark each such path as requires a join to this other set of rels
to be valid.  But then we'd go ahead and join such paths to *other*
rels, keeping the resulting join paths still marked as requiring the
same future join.  Once that join actually happens, the resulting path
becomes fully valid.  Only a join to a proper subset of the future-join
requirement would be disallowed meanwhile.

I'm not even sure this would be slower or more complicated than what we
do now --- if you look at the logic that caches potential inner
indexscan plans, it's almost doing this already.  It would result in
considering more join paths, but only ones that have some plausible use.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-09-22 Thread Robert Haas
On Wed, Sep 9, 2009 at 11:25 PM, Andrew Gierth
and...@tao11.riddles.org.uk wrote:
   4. LATERAL allows some optimizations that aren't currently done, either
   by explicitly rewriting the query, or (in theory) the optimizer itself
   could consider a lateral plan (I believe Oracle does this). This would
   apply to queries of this form:
  
   SELECT ... FROM t1 LEFT JOIN (t2 JOIN t3 ON (t2.a=t3.a)) on (t1.a=t2.a);
  
   which currently forces the t2/t3 join to occur first even where t1 is
   small; this could be rewritten with LATERAL as:
  
   SELECT ...
          FROM t1
          LEFT JOIN LATERAL (select * from t2 join t3 on (t2.a=t3.a)
                              where t2.a=t1.a) s
          ON true;

  Robert Well, you haven't actually commuted the joins here - how do
  Robert you have in mind for PostgreSQL to execute this?  I'm
  Robert guessing that it's something like a nest loop with t1 as the
  Robert outer side and the lateral subquery as the inner side, so
  Robert that the executor repeatedly executes
  Robert select * from t2 join t3 on t2.a = t3.a where t2.a = $1?

 Yup.

 The current execution plans for this type of query are completely
 disastrous if t1 is small (or qualified so as to be small) and t2 and
 t3 are large. Having LATERAL would allow the query to be rewritten to
 perform reasonably; a bonus would be for the planner to consider the
 lateral join automatically without requiring it to be explicitly
 requested.

I've been turning this one over in my head.  It seems to me that this
is very similar to what we already do with inner index-scans, only
generalized to joinrels.  When we consider nested loop plans in
match_unsorted_outer(), we really consider two quite different types
of plans - call them parameterized and unparameterized.  The
unparameterized variants considered regardless of the details of the
inner plan, and basically rescan the same identical inner side once
per outer row, possibly with a materialize node interposed.  The
parameterized variants are what the inner index-scan code is looking
at: they also involve rescanning the inner path, but not the same set
of tuples every time.   Instead, we look for cases where an index is
available that can support a lookup of the specific values that the
outer side needs on each pass as an index condition.

Currently, however, we only consider this possibility when the inner
rel is NOT a joinrel.  It seems like it might be possible to change
this, but it doesn't look straightforward.  When the inner rel is a
baserel, there's really nothing to do except grovel through the
indexes looking for something useful, and then estimate the cost of
using it.  When the inner rel is a joinrel, it's a lot more
complicated.  A parameterized nested loop might be right even if only
some of the rels making up the joinrel are indexed (imagine t2 IJ t3
IJ t4 where t2 and t3 are large and indexed and t4 is small and
unindexed).  In addition, the relations making up the inner rel could
be joined in any order and using a variety of strategies.  The path
that is best when trying to suck down the ENTIRE inner rel doesn't
necessarily bear any resemblence to the plan that is best when trying
to suck down only a part of it.

To some extent, this seems like a tangent as far as LATERAL is
concerned: I think the primary reason why people want LATERAL
(certainly, why I want it) is for SRFs, for which there isn't any
choice of execution plan anyway.  But it's certainly an interesting
optimization problem, and I wish I had some better ideas on how to
tackle it.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-09-22 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I've been turning this one over in my head.  It seems to me that this
 is very similar to what we already do with inner index-scans, only
 generalized to joinrels.

Right.

 Currently, however, we only consider this possibility when the inner
 rel is NOT a joinrel.  It seems like it might be possible to change
 this, but it doesn't look straightforward.

Well, it's straightforward enough in theory, it's just the exponential
growth in the number of join paths to consider that's a problem :-(.
I think what we'd need is a heuristic to limit the paths considered.

I think Andrew was suggesting that we not attempt to consider this
automatically, but only when the user writes the query in a way that
exposes it directly via LATERAL.  While that goes against my normal
instincts for the planner, it isn't unreasonable as a first step.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-09-09 Thread Andrew Gierth
 Robert == Robert Haas robertmh...@gmail.com writes:

  Just to pick up on some points from the discussion:
  
  1. LATERAL has to be explicit because it changes the scope of
  references.  For example, in:
  ... (select ... FROM (select a AS b), (select b)) ...
  the b in the second subselect could be an outer reference, but
  it can't be a reference to the first subquery; whereas in:
  ... (select ... FROM (select a AS b), LATERAL (select b)) ...
  the b in the second subselect refers to the result of the first
  subselect.

 Robert Can you provide a more complete example?  I'm unable to
 Robert construct a working example of this type.  For example:

 Robert rhaas=# select (select 1 from (select a as b) x, (select b) y) from t1;
 Robert ERROR:  subquery in FROM cannot refer to other relations of same query
 Robert level at character 50

That looks like a bug to me. The spec is explicit that the inner definition
of b is not in scope in the second subquery, and therefore that should parse
as an outer reference.

  2. LATERAL in general constrains both the join order and the join
  plan, assuming any lateral references are actually made.

 Robert Peter seemed to be saying that LATERAL() must syntactically
 Robert follow the same-level FROM items to which it refers.  Is that
 Robert your understanding also?

LATERAL references must be to items defined in the same FROM clause and
to the left of the LATERAL.

The relevant language of the spec seems to be:

  a) If TR is contained in a from clause FC with no intervening
 query expression, then the scope clause SC of TR is the select
 statement: single row or innermost query specification that
 contains FC.  The scope of a range variable of TR is the select
 list, where clause, group by clause, having clause, and
 window clause of SC, together with every lateral derived table
 that is simply contained in FC and is preceded by TR, and every
 collection derived table that is simply contained in FC and is
 preceded by TR, and the join condition of all joined tables
 contained in SC that contain TR. If SC is the query specification
 that is the query expression body of a simple table query STQ,
 then the scope of a range variable of TR also includes the order
 by clause of STQ.

  4. LATERAL allows some optimizations that aren't currently done, either
  by explicitly rewriting the query, or (in theory) the optimizer itself
  could consider a lateral plan (I believe Oracle does this). This would
  apply to queries of this form:
  
  SELECT ... FROM t1 LEFT JOIN (t2 JOIN t3 ON (t2.a=t3.a)) on (t1.a=t2.a);
  
  which currently forces the t2/t3 join to occur first even where t1 is
  small; this could be rewritten with LATERAL as:
  
  SELECT ...
 FROM t1
 LEFT JOIN LATERAL (select * from t2 join t3 on (t2.a=t3.a)
 where t2.a=t1.a) s
 ON true;

 Robert Well, you haven't actually commuted the joins here - how do
 Robert you have in mind for PostgreSQL to execute this?  I'm
 Robert guessing that it's something like a nest loop with t1 as the
 Robert outer side and the lateral subquery as the inner side, so
 Robert that the executor repeatedly executes
 Robert select * from t2 join t3 on t2.a = t3.a where t2.a = $1?

Yup.

The current execution plans for this type of query are completely
disastrous if t1 is small (or qualified so as to be small) and t2 and
t3 are large. Having LATERAL would allow the query to be rewritten to
perform reasonably; a bonus would be for the planner to consider the
lateral join automatically without requiring it to be explicitly
requested.

-- 
Andrew (irc:RhodiumToad)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-09-08 Thread Peter Eisentraut
On mån, 2009-09-07 at 19:06 -0400, Robert Haas wrote:
 On Mon, Sep 7, 2009 at 3:43 AM, Peter Eisentrautpete...@gmx.net wrote:
  Because joins can be reordered, whereas LATERAL creates a kind of
  syntactic sequence point for join reordering.  To pick up your example:
 
  But this doesn't [work]:
 
  select g, h from generate_series(1,10) g, generate_series(1,g) h;
 
  You need to constrain the order of the from items in some way so the g
  refers to something well-defined.  That's what LATERAL does.
 
 I don't think so.  All joins constrain the join order,

I carefully did not say constrain the join order but constraint the
order of the from items and a *syntactic* sequence point.  If the
order of the from items is free, then you could also write the above
example as

select g, h from generate_series(1,g) h, generate_series(1,10) g;

but that would no longer be valid with LATERAL in there.

  but none of
 them except FULL JOIN constrain it completely, and this is no
 exception.

FWIW, the spec says somewhere that LATERAL is not allowed at or near an
outer join.  I'll have to read up on it again.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-09-08 Thread Andrew Gierth
 David == David Fetter da...@fetter.org writes:

  I've attempted to search the archives for references to the SQL
  LATERAL feature, which AIUI is fairly-frequently requested.
  [snip]
  Has anyone poked at this at all?

 David I believe Andrew (RhodiumToad) Gierth is taking a look at
 David implementing the full LATERAL feature from SQL:2008.

I've looked, but I've not actually had time to try any actual work on
implementation, so anyone else who fancies a go shouldn't hesitate.

Just to pick up on some points from the discussion:

1. LATERAL has to be explicit because it changes the scope of
references.  For example, in:
... (select ... FROM (select a AS b), (select b)) ...
the b in the second subselect could be an outer reference, but
it can't be a reference to the first subquery; whereas in:
... (select ... FROM (select a AS b), LATERAL (select b)) ...
the b in the second subselect refers to the result of the first
subselect.

2. LATERAL in general constrains both the join order and the join
plan, assuming any lateral references are actually made.

3. LATERAL specifically IS allowed with left outer joins, though the
syntax productions in the spec are sufficiently obscure that this
isn't obvious.  In general there are (as far as I can tell from the
syntax rules) two ways to use it:

SELECT ... FROM foo, LATERAL (bar)

or

SELECT ... FROM foo [LEFT] JOIN LATERAL (bar) ON ...

Note that RIGHT JOIN LATERAL and FULL JOIN LATERAL are expressly excluded
(syntax rule 2 for joined table).

4. LATERAL allows some optimizations that aren't currently done, either
by explicitly rewriting the query, or (in theory) the optimizer itself
could consider a lateral plan (I believe Oracle does this). This would
apply to queries of this form:

SELECT ... FROM t1 LEFT JOIN (t2 JOIN t3 ON (t2.a=t3.a)) on (t1.a=t2.a);

which currently forces the t2/t3 join to occur first even where t1 is
small; this could be rewritten with LATERAL as:

SELECT ...
  FROM t1
   LEFT JOIN LATERAL (select * from t2 join t3 on (t2.a=t3.a)
   where t2.a=t1.a) s
   ON true;

-- 
Andrew (irc:RhodiumToad)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-09-08 Thread Robert Haas
On Tue, Sep 8, 2009 at 6:29 PM, Andrew
Gierthand...@tao11.riddles.org.uk wrote:
 David == David Fetter da...@fetter.org writes:

   I've attempted to search the archives for references to the SQL
   LATERAL feature, which AIUI is fairly-frequently requested.
   [snip]
   Has anyone poked at this at all?

  David I believe Andrew (RhodiumToad) Gierth is taking a look at
  David implementing the full LATERAL feature from SQL:2008.

 I've looked, but I've not actually had time to try any actual work on
 implementation, so anyone else who fancies a go shouldn't hesitate.

Thanks for your thoughts - I appreciate it.

 Just to pick up on some points from the discussion:

 1. LATERAL has to be explicit because it changes the scope of
 references.  For example, in:
 ... (select ... FROM (select a AS b), (select b)) ...
 the b in the second subselect could be an outer reference, but
 it can't be a reference to the first subquery; whereas in:
 ... (select ... FROM (select a AS b), LATERAL (select b)) ...
 the b in the second subselect refers to the result of the first
 subselect.

Can you provide a more complete example?  I'm unable to construct a
working example of this type.  For example:

rhaas=# select (select 1 from (select a as b) x, (select b) y) from t1;
ERROR:  subquery in FROM cannot refer to other relations of same query
level at character 50

Though this works as expected:
rhaas=# select (select 1 from (select a) x, (select b) y) from t1;

 2. LATERAL in general constrains both the join order and the join
 plan, assuming any lateral references are actually made.

Peter seemed to be saying that LATERAL() must syntactically follow the
same-level FROM items to which it refers.  Is that your understanding
also?

 3. LATERAL specifically IS allowed with left outer joins, though the
 syntax productions in the spec are sufficiently obscure that this
 isn't obvious.  In general there are (as far as I can tell from the
 syntax rules) two ways to use it:

 SELECT ... FROM foo, LATERAL (bar)

 or

 SELECT ... FROM foo [LEFT] JOIN LATERAL (bar) ON ...

 Note that RIGHT JOIN LATERAL and FULL JOIN LATERAL are expressly excluded
 (syntax rule 2 for joined table).

Makes sense to me.

 4. LATERAL allows some optimizations that aren't currently done, either
 by explicitly rewriting the query, or (in theory) the optimizer itself
 could consider a lateral plan (I believe Oracle does this). This would
 apply to queries of this form:

 SELECT ... FROM t1 LEFT JOIN (t2 JOIN t3 ON (t2.a=t3.a)) on (t1.a=t2.a);

 which currently forces the t2/t3 join to occur first even where t1 is
 small; this could be rewritten with LATERAL as:

 SELECT ...
  FROM t1
       LEFT JOIN LATERAL (select * from t2 join t3 on (t2.a=t3.a)
                                   where t2.a=t1.a) s
       ON true;

Well, you haven't actually commuted the joins here - how do you have
in mind for PostgreSQL to execute this?  I'm guessing that it's
something like a nest loop with t1 as the outer side and the lateral
subquery as the inner side, so that the executor repeatedly executes
select * from t2 join t3 on t2.a = t3.a where t2.a = $1?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-09-07 Thread Peter Eisentraut
On Sun, 2009-09-06 at 23:59 -0400, Robert Haas wrote:
 Based on reading through this discussion, it appears that LATERAL is
 mostly a bit of syntactic sugar that requests that the parser allow
 you to reference tables at the same query level.  Assuming that the
 necessary executor support were present (which it's currently not),
 it's unclear to me why one couldn't simply allow such references
 unconditionally.

Because joins can be reordered, whereas LATERAL creates a kind of
syntactic sequence point for join reordering.  To pick up your example:

 But this doesn't [work]:
 
 select g, h from generate_series(1,10) g, generate_series(1,g) h;

You need to constrain the order of the from items in some way so the g
refers to something well-defined.  That's what LATERAL does.

You could argue that the parser could infer the references and the
resultant join ordering restrictions automatically, but perhaps it was
deemed that an explicit specification would be less error-prone.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-09-07 Thread Robert Haas
On Mon, Sep 7, 2009 at 3:43 AM, Peter Eisentrautpete...@gmx.net wrote:
 On Sun, 2009-09-06 at 23:59 -0400, Robert Haas wrote:
 Based on reading through this discussion, it appears that LATERAL is
 mostly a bit of syntactic sugar that requests that the parser allow
 you to reference tables at the same query level.  Assuming that the
 necessary executor support were present (which it's currently not),
 it's unclear to me why one couldn't simply allow such references
 unconditionally.

 Because joins can be reordered, whereas LATERAL creates a kind of
 syntactic sequence point for join reordering.  To pick up your example:

 But this doesn't [work]:

 select g, h from generate_series(1,10) g, generate_series(1,g) h;

 You need to constrain the order of the from items in some way so the g
 refers to something well-defined.  That's what LATERAL does.

I don't think so.  All joins constrain the join order, but none of
them except FULL JOIN constrain it completely, and this is no
exception.  Consider this query:

select * from foo f, generate_series(1, 10) g, generate_series(1, g) h
WHERE f.x = g;

There are two legal join orderings here: f {g h} and {f g} h, just as
there would be for a three-way inner join:

select * from foo f, goo g, hoo h WHERE f.x = g.x and g.x = h.x;

I believe that the join-order restrictions imposed by a LATERAL SRF
are identical to those that would be imposed by an inner join against
a table with join clauses referencing the same tables used in the
inputs to the SRF.  What is different is that there is only one
possible method of implementing the join, namely, for each outer row,
evaluate the SRF.  We can't hash or merge, and we can't swap the inner
and outer rels, but we do still have a choice as to whether to join
against h before or after joining against f.

 You could argue that the parser could infer the references and the
 resultant join ordering restrictions automatically, but perhaps it was
 deemed that an explicit specification would be less error-prone.

Well, the irony is that our current code does already infer the
references - and then disallows them.  It seems to me that if we
implement LATERAL(), we're basically going to be conditionalizing
those checks on whether the relevant subexpression has been wrapped in
LATERAL or not.  Introducing a new keyword would make sense if it were
possible to write a query that is valid without LATERAL(), but means
something different with LATERAL() - but I'm currently unable to
devise such a scenario.  Whether this is for want of creativity or
because none exists I'm not entirely sure.  Any ideas?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-09-07 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Sep 7, 2009 at 3:43 AM, Peter Eisentrautpete...@gmx.net wrote:
 You could argue that the parser could infer the references and the
 resultant join ordering restrictions automatically, but perhaps it was
 deemed that an explicit specification would be less error-prone.

 Well, the irony is that our current code does already infer the
 references - and then disallows them.

Because as often as not, they're mistakes.  Please don't think
you're smarter than the spec here.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-09-07 Thread Robert Haas
On Mon, Sep 7, 2009 at 7:47 PM, Tom Lanet...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Sep 7, 2009 at 3:43 AM, Peter Eisentrautpete...@gmx.net wrote:
 You could argue that the parser could infer the references and the
 resultant join ordering restrictions automatically, but perhaps it was
 deemed that an explicit specification would be less error-prone.

 Well, the irony is that our current code does already infer the
 references - and then disallows them.

 Because as often as not, they're mistakes.  Please don't think
 you're smarter than the spec here.

You're frequently the first to criticize the spec, but I have no
interest in second-guessing whatever behavior the spec specifies for
this construct.  I'm just trying to understand it, and as far as I can
tell, LATERAL() is just a piece of syntactic sugar that allows
expressions within to reference FROM items at the same query level.
As far as I can tell, it doesn't change the semantic of any legal
queries - it just renders legal notation that otherwise would be
rejected.  But is my understanding correct?

I haven't got a copy of the spec, so that's a bit of a handicap.  If
someone who does can look this up and comment on how it's supposed to
work, I would certainly appreciate that.  My understanding of it is
currently based on random articles cherry-picked around the Internet
and a handful of emails from archives.postgresql.org, which seems a
little thin.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-09-07 Thread Alvaro Herrera
Robert Haas escribió:

 I haven't got a copy of the spec, so that's a bit of a handicap.  If
 someone who does can look this up and comment on how it's supposed to
 work, I would certainly appreciate that.  My understanding of it is
 currently based on random articles cherry-picked around the Internet
 and a handful of emails from archives.postgresql.org, which seems a
 little thin.

http://wiki.postgresql.org/wiki/Developer_FAQ#Where_can_I_get_a_copy_of_the_SQL_standards.3F

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-09-07 Thread Stephen Frost
Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
 On Mon, Sep 7, 2009 at 7:47 PM, Tom Lanet...@sss.pgh.pa.us wrote:
  Because as often as not, they're mistakes.  Please don't think
  you're smarter than the spec here.
 
 You're frequently the first to criticize the spec, but I have no
 interest in second-guessing whatever behavior the spec specifies for
 this construct.

I'm not entirely sure you followed what Tom was getting at here.  If you
did, feel free to ignore me.

 I'm just trying to understand it, and as far as I can
 tell, LATERAL() is just a piece of syntactic sugar that allows
 expressions within to reference FROM items at the same query level.

What I'm gathering is that this may be correct, though I don't know for
sure.  The point I think Tom was making is that even if it *is* just
syntactic sugar, we don't want to allow expressions to reference FROM
items at the same query level *unless* LATERAL is specified.  Your
earlier comments sounded like you would want to implement allowing
expressions to refer to FROM items at the same query level without
LATERAL being specified.

 I haven't got a copy of the spec, so that's a bit of a handicap.  If
 someone who does can look this up and comment on how it's supposed to
 work, I would certainly appreciate that.  My understanding of it is
 currently based on random articles cherry-picked around the Internet
 and a handful of emails from archives.postgresql.org, which seems a
 little thin.

You can get a 'draft' that's very close to the spec pretty easily..
Just do '??sql' in IRC sometime..

Enjoy,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] LATERAL

2009-09-07 Thread Robert Haas
On Mon, Sep 7, 2009 at 9:12 PM, Stephen Frostsfr...@snowman.net wrote:
 Robert,

 * Robert Haas (robertmh...@gmail.com) wrote:
 On Mon, Sep 7, 2009 at 7:47 PM, Tom Lanet...@sss.pgh.pa.us wrote:
  Because as often as not, they're mistakes.  Please don't think
  you're smarter than the spec here.

 You're frequently the first to criticize the spec, but I have no
 interest in second-guessing whatever behavior the spec specifies for
 this construct.

 I'm not entirely sure you followed what Tom was getting at here.  If you
 did, feel free to ignore me.

 I'm just trying to understand it, and as far as I can
 tell, LATERAL() is just a piece of syntactic sugar that allows
 expressions within to reference FROM items at the same query level.

 What I'm gathering is that this may be correct, though I don't know for
 sure.  The point I think Tom was making is that even if it *is* just
 syntactic sugar, we don't want to allow expressions to reference FROM
 items at the same query level *unless* LATERAL is specified.  Your
 earlier comments sounded like you would want to implement allowing
 expressions to refer to FROM items at the same query level without
 LATERAL being specified.

Fair enough.  I think I started to drift off in the direction of
making that argument, but it wasn't really my point.  The original
point I was trying to make is that we may not need to invent any kind
of new name-resolution or scoping in order to make LATERAL() work.
Instead, LATERAL() can just do everything normally with the exception
of not throwing the following errors when they would otherwise be
thrown:

subquery in FROM cannot refer to other relations of the same query level
function expression in FROM cannot refer to other relations of the
same query level

I'm not sure if I'm right about this, but if I am, it seems likely to
be a pretty straightforward change.

 I haven't got a copy of the spec, so that's a bit of a handicap.  If
 someone who does can look this up and comment on how it's supposed to
 work, I would certainly appreciate that.  My understanding of it is
 currently based on random articles cherry-picked around the Internet
 and a handful of emails from archives.postgresql.org, which seems a
 little thin.

 You can get a 'draft' that's very close to the spec pretty easily..
 Just do '??sql' in IRC sometime..

Thanks for the tip.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-09-07 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 Fair enough.  I think I started to drift off in the direction of
 making that argument, but it wasn't really my point.  

To be honest, I'm not sure I agree with Tom here on the value of
requiring a keyword to tell the system that you really mean what you
wrote.  On the other hand, it sounds like the spec is pretty clear on
this, and I don't feel we should violate it just because we think it's
being silly on this point.

 The original
 point I was trying to make is that we may not need to invent any kind
 of new name-resolution or scoping in order to make LATERAL() work.
 Instead, LATERAL() can just do everything normally with the exception
 of not throwing the following errors when they would otherwise be
 thrown:

I don't know for sure, but I do hope you're right.  I'd certainly love
to be able to do this in general..  There's a number of cases where I've
had to do the hokey-pokey to get around our lack of ability to do this
when using set-returning functions.

 I'm not sure if I'm right about this, but if I am, it seems likely to
 be a pretty straightforward change.

Please continue to explore it and propose a patch. :)

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] LATERAL

2009-09-07 Thread Robert Haas
On Mon, Sep 7, 2009 at 10:08 PM, Stephen Frostsfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 Fair enough.  I think I started to drift off in the direction of
 making that argument, but it wasn't really my point.

 To be honest, I'm not sure I agree with Tom here on the value of
 requiring a keyword to tell the system that you really mean what you
 wrote.  On the other hand, it sounds like the spec is pretty clear on
 this, and I don't feel we should violate it just because we think it's
 being silly on this point.

That's pretty much where I am, too, modulo needing to better
understand the aforesaid spec.

 The original
 point I was trying to make is that we may not need to invent any kind
 of new name-resolution or scoping in order to make LATERAL() work.
 Instead, LATERAL() can just do everything normally with the exception
 of not throwing the following errors when they would otherwise be
 thrown:

 I don't know for sure, but I do hope you're right.  I'd certainly love
 to be able to do this in general..  There's a number of cases where I've
 had to do the hokey-pokey to get around our lack of ability to do this
 when using set-returning functions.

Me too.

 I'm not sure if I'm right about this, but if I am, it seems likely to
 be a pretty straightforward change.

 Please continue to explore it and propose a patch. :)

Yeah, that's the not-so-easy part.  Gotta grok this executor stuff
first before I can even think about implementing this.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] LATERAL

2009-09-06 Thread Robert Haas
I've attempted to search the archives for references to the SQL
LATERAL feature, which AIUI is fairly-frequently requested.  Most of
the discussion that I've found harks back to 2003, and that seems to
discuss more the need for eventual support of the feature than exactly
what it is or how to implement it.  Looking around the web a little, I
found these links:

http://farrago.sourceforge.net/design/CollectionTypes.html
http://iablog.sybase.com/paulley/2008/07/cross-and-outer-apply/

Based on reading through this discussion, it appears that LATERAL is
mostly a bit of syntactic sugar that requests that the parser allow
you to reference tables at the same query level.  Assuming that the
necessary executor support were present (which it's currently not),
it's unclear to me why one couldn't simply allow such references
unconditionally.  We currently explicitly block such references in
parse_clause.c, but the comments indicate that this is for reasons of
standards-conformance, not semantic ambiguity.

It appears that we're not completely without the ability to handle
queries of this type.  For example, this works:

select g, generate_series(1,g) as h from generate_series(1,10) g;

But this doesn't:

select g, h from generate_series(1,10) g, generate_series(1,g) h;

Just for kicks, I tried removing the code that throws a syntax error
on the latter query, which resulted in a core dump inside
ExecEvalVar(), execQual.c:546, trying to deference a TupleTableSlot.
I'm guessing that this is because it can't get access to the right
variables - the plan actually looks quite sane:

rhaas=# explain select g, h from generate_series(1,10) g,
generate_series(1,g) h;
   QUERY PLAN

 Nested Loop  (cost=0.00..22512.50 rows=100 width=8)
   -  Function Scan on generate_series g  (cost=0.00..12.50 rows=1000 width=4)
   -  Function Scan on generate_series h  (cost=0.00..12.50 rows=1000 width=4)
(3 rows)

The first query just shows up a function scan, which means there's
some projection operation happening here that isn't exposed at the
plan level.  I'm guessing that what needs to happen here is that the
planner needs to be taught that LATERAL queries can only be
implemented as a nested loop (right? unless they're not really using
the LATERAL-ness...) and that the executor needs to be taught how to
make the vars for the outer side of a nestloop visible to the inner
side, when necessary.

Has anyone poked at this at all?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LATERAL

2009-09-06 Thread David Fetter
On Sun, Sep 06, 2009 at 11:59:22PM -0400, Robert Haas wrote:
 I've attempted to search the archives for references to the SQL
 LATERAL feature, which AIUI is fairly-frequently requested.
 [snip]
 Has anyone poked at this at all?

I believe Andrew (RhodiumToad) Gierth is taking a look at
implementing the full LATERAL feature from SQL:2008.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers