Re: parallelizing subplan execution (was: [HACKERS] explain and PARAM_EXEC)

2010-06-30 Thread Mark Wong
On Sat, Jun 26, 2010 at 6:01 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jun 25, 2010 at 10:47 PM, Mark Wong mark...@gmail.com wrote:
 http://pages.cs.wisc.edu/~dewitt/includes/publications.html

 Some of these papers aren't the type of parallelism we're talking
 about here, but the ones that I think are relevant talk mostly about
 parallelizing hash based joins.  I think we might be lacking an
 operator or two though in order to do some of these things.

 This part (from the first paper linked on that page) is not terribly
 encouraging.

 Current database query optimizers do not consider all possible plans
 when optimizing a relational query. While cost models for relational
 queries running on a single processor are now well-understood
 [SELI79], they still depend on cost estimators that are a guess at
 best. Some dynamically select from among several plans at run time
 depending on, for example, the amount of physical memory actually
 available and the cardinalities of the intermediate results [GRAE89].
 To date, no query optimizers consider all the parallel algorithms for
 each operator and all the query tree organizations. More work is
 needed in this area.

 The section (from that same paper) on parallelizing hash joins and
 merge-join-over-sort is interesting, and I can definitely imagine
 those techniques being a win for us.  But I'm not too sure how we'd
 know when to apply them - that is, what algorithm would the query
 optimizer use?  I'm sure we could come up with something, but I'd get
 a warmer, fuzzier feeling if we could implement the fruits of someone
 else's research rather than rolling our own.

I found another starting point for more papers here:

http://infolab.stanford.edu/joker/joqrs.html

The links on this page don't work anymore but many of these are easily
found by searching for the title.  I've only gone through some
abstracts so far, but it seems to me that they discuss some query
optimization techniques for parallel systems.

Regards,
Mark

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: parallelizing subplan execution (was: [HACKERS] explain and PARAM_EXEC)

2010-06-27 Thread Simon Riggs
On Sat, 2010-06-26 at 21:01 -0400, Robert Haas wrote:

 The section (from that same paper) on parallelizing hash joins and
 merge-join-over-sort is interesting, and I can definitely imagine
 those techniques being a win for us.  But I'm not too sure how we'd
 know when to apply them - that is, what algorithm would the query
 optimizer use?  I'm sure we could come up with something, but I'd get
 a warmer, fuzzier feeling if we could implement the fruits of someone
 else's research rather than rolling our own.

You've just touched on why parallel query is hard. There is a big bucket
of executor code to write and then lots of very subtle thinking,
heuristics and usability parameters to make parallel query sensibly
optimised. You need both to make it actually work in practice (without
hints).

Parallel sub-plans is not a good case to start with because it presumes
only certain kinds of plans are in place. It wouldn't be usable for the
majority of plans.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: parallelizing subplan execution (was: [HACKERS] explain and PARAM_EXEC)

2010-06-26 Thread Robert Haas
On Fri, Jun 25, 2010 at 10:47 PM, Mark Wong mark...@gmail.com wrote:
 http://pages.cs.wisc.edu/~dewitt/includes/publications.html

 Some of these papers aren't the type of parallelism we're talking
 about here, but the ones that I think are relevant talk mostly about
 parallelizing hash based joins.  I think we might be lacking an
 operator or two though in order to do some of these things.

This part (from the first paper linked on that page) is not terribly
encouraging.

Current database query optimizers do not consider all possible plans
when optimizing a relational query. While cost models for relational
queries running on a single processor are now well-understood
[SELI79], they still depend on cost estimators that are a guess at
best. Some dynamically select from among several plans at run time
depending on, for example, the amount of physical memory actually
available and the cardinalities of the intermediate results [GRAE89].
To date, no query optimizers consider all the parallel algorithms for
each operator and all the query tree organizations. More work is
needed in this area.

The section (from that same paper) on parallelizing hash joins and
merge-join-over-sort is interesting, and I can definitely imagine
those techniques being a win for us.  But I'm not too sure how we'd
know when to apply them - that is, what algorithm would the query
optimizer use?  I'm sure we could come up with something, but I'd get
a warmer, fuzzier feeling if we could implement the fruits of someone
else's research rather than rolling our own.

 I'm also ignoring the difficulties of getting hold of a second backend
 in the right state - same database, same snapshot, etc.  It seems to
 me unlikely that there are a substantial number of real-world
 applications for which this will not work very well if we have to
 actually start a new backend every time we want to parallelize a
 query.  IOW, we're going to need, well, a connection pool in core.
 *ducks, runs for cover*

 Do we think it's worth proofing that we can execute a plan in
 parallel?  Something simple, if not the best case, say a nested loop
 join between two tables?  Just as a starting point before worrying too
 much about what is the best thing to parallelize, or how the degree of
 parallelism will be controller?

Well, we can certainly DO it, I guess.  It's just a question of
whether we can make it fairly automatic and capable of delivering good
results in the real world.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: parallelizing subplan execution (was: [HACKERS] explain and PARAM_EXEC)

2010-06-25 Thread Mark Wong
Hi all,

Sorry for jumping in over 4 months later...

On Sat, Feb 20, 2010 at 8:25 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Feb 20, 2010 at 8:31 AM, Dimitri Fontaine
 dfonta...@hi-media.com wrote:
 This is really a topic for another thread, but at 100,000 feet it
 seems to me that the hardest question is - how will you decide which
 operations to parallelize in the first place?  Actually making it
 happen is really hard, too, of course, but even to get that that point
 you have to have some model for what types of operations it makes
 sense to parallelize and how you're going to decide when it's a win.

 My naive thoughts would be to add some cost parameters. The fact to
 fork() another backend first, then model for each supported subplan (we
 will want to add more, or maybe have a special rendez-vous-materialise
 node) some idea of the data exchange cost.

 Now the planner would as usual try to find the less costly plan, and
 will be able to compare plans with and without distributing the work.

 Overly naive ?

 Probably.  For one thing, you can't use fork(), because it won't work
 on Windows.

 It seems to me that you need to start by thinking about what kinds of
 queries could be usefully parallelized.  What I think you're proposing
 here, modulo large amounts of hand-waving, is that we should basically
 find a branch of the query tree, cut it off, and make that branch the
 responsibility of a subprocess.  What kinds of things would be
 sensible to hand off in this way?  Well, you'd want to find nodes that
 are not likely to be repeatedly re-executed with different parameters,
 like subplans or inner-indexscans, because otherwise you'll get
 pipeline stalls handing the new parameters back and forth.  And you
 want to find nodes that are expensive for the same reason.  So maybe
 this would work for something like a merge join on top of two sorts -
 one backend could perform each sort, and then whichever one was the
 child would stream the tuples to the parent for the final merge.  Of
 course, this assumes the I/O subsystem can keep up, which is not a
 given - if both tables are fed by the same, single spindle, it might
 be worse than if you just did the sorts consecutively.

 This approach might also benefit queries that are very CPU-intensive,
 on a multi-core system with spare cycles.  Suppose you have a big tall
 stack of hash joins, each with a small inner rel.  The child process
 does about half the joins and then pipelines the results into the
 parent, which does the other half and returns the results.

 But there's at least one other totally different way of thinking about
 this problem, which is that you might want two processes to cooperate
 in executing the SAME query node - imagine, for example, a big
 sequential scan with an expensive but highly selective filter
 condition, or an enormous sort.  You have all the same problems of
 figuring out when it's actually going to help, of course, but the
 details will likely be quite different.

 I'm not really sure which one of these would be more useful in
 practice - or maybe there are even other strategies.  What does
 $COMPETITOR do?

I feel that the answer is it depends.  To partially answer what others
are doing, I'll present some papers from someone we might recognize as
a starting point. :)

http://pages.cs.wisc.edu/~dewitt/includes/publications.html

Some of these papers aren't the type of parallelism we're talking
about here, but the ones that I think are relevant talk mostly about
parallelizing hash based joins.  I think we might be lacking an
operator or two though in order to do some of these things.

 I'm also ignoring the difficulties of getting hold of a second backend
 in the right state - same database, same snapshot, etc.  It seems to
 me unlikely that there are a substantial number of real-world
 applications for which this will not work very well if we have to
 actually start a new backend every time we want to parallelize a
 query.  IOW, we're going to need, well, a connection pool in core.
 *ducks, runs for cover*

Do we think it's worth proofing that we can execute a plan in
parallel?  Something simple, if not the best case, say a nested loop
join between two tables?  Just as a starting point before worrying too
much about what is the best thing to parallelize, or how the degree of
parallelism will be controller?

Regards,
Mark

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain and PARAM_EXEC

2010-04-12 Thread Yeb Havinga
The patch I sent earlier is flaud with respect to subplan parameter 
numbering, I counted from zero where the parParam list had to be used.


Yeb Havinga wrote: 

See patch below against HEAD.

Example of query against catalog:

postgres=# explain verbose select oid::int + 1,(select oid from 
pg_class a where a.oid = b.relfilenode and a.relnamespace = 
b.relnamespace) from pg_class b;
  QUERY 
PLAN  
 


Seq Scan on pg_catalog.pg_class b  (cost=0.00..2459.64 rows=296 width=12)
  Output: ((b.oid)::integer + 1), SubPlan 1 ($0 := b.relfilenode, $1 
:= b.relnamespace)

  SubPlan 1
-  Index Scan using pg_class_oid_index on pg_catalog.pg_class a  
(cost=0.00..8.27 rows=1 width=4)

  Output: a.oid
  Index Cond: (a.oid = $0)
  Filter: (a.relnamespace = $1)
(7 rows)

Explain of alternative subplan (query comes from aggregates.sql in 
regression).


regression=# explain verbose select ten, sum(distinct four) from onek a
group by ten
having exists (select 1 from onek b where sum(distinct a.four) = b.four);
 QUERY 
PLAN 
-- 


GroupAggregate  (cost=197.83..216.52 rows=10 width=8)
  Output: a.ten, sum(DISTINCT a.four)
  Filter: (alternatives: SubPlan 1 ($0 := sum(DISTINCT a.four)) or 
hashed SubPlan 2)

  -  Sort  (cost=197.83..200.33 rows=1000 width=8)
Output: a.ten, a.four
Sort Key: a.ten
-  Seq Scan on public.onek a  (cost=0.00..148.00 rows=1000 
width=8)

  Output: a.ten, a.four
  SubPlan 1
-  Seq Scan on public.onek b  (cost=0.00..150.50 rows=250 width=0)
  Filter: ($0 = b.four)
  SubPlan 2
-  Seq Scan on public.onek b  (cost=0.00..148.00 rows=1000 width=4)
  Output: b.four
(14 rows)

Would the explain above be better if the filter with subplans 
arguments showed the EXISTS keyword? Much code from get_sublink_expr 
could be reused to show additional info depending on the sublink type 
and testexpr.


Other info: there are now only appends to the context-buf. The 
problems with OUTER var I mentioned earlier are gone now arguments are 
deparsed using get_rule_expr instead of deparse_expression.


regards,
Yeb Havinga




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain and PARAM_EXEC

2010-04-02 Thread Yeb Havinga

Tom Lane wrote:

Anyway, I hadn't looked at your patch before, but now that I have, it's
not even approximately what I was suggesting.  What I thought you should
do was change ruleutils.c to print the parameter expressions at the call
site, ie in the T_SubPlan and T_AlternativeSubPlan cases in get_rule_expr.
  

See patch below against HEAD.

Example of query against catalog:

postgres=# explain verbose select oid::int + 1,(select oid from pg_class 
a where a.oid = b.relfilenode and a.relnamespace = b.relnamespace) from 
pg_class b;
  QUERY 
PLAN  


Seq Scan on pg_catalog.pg_class b  (cost=0.00..2459.64 rows=296 width=12)
  Output: ((b.oid)::integer + 1), SubPlan 1 ($0 := b.relfilenode, $1 := 
b.relnamespace)

  SubPlan 1
-  Index Scan using pg_class_oid_index on pg_catalog.pg_class a  
(cost=0.00..8.27 rows=1 width=4)

  Output: a.oid
  Index Cond: (a.oid = $0)
  Filter: (a.relnamespace = $1)
(7 rows)

Explain of alternative subplan (query comes from aggregates.sql in 
regression).


regression=# explain verbose select ten, sum(distinct four) from onek a
group by ten
having exists (select 1 from onek b where sum(distinct a.four) = b.four);
 QUERY 
PLAN 
--

GroupAggregate  (cost=197.83..216.52 rows=10 width=8)
  Output: a.ten, sum(DISTINCT a.four)
  Filter: (alternatives: SubPlan 1 ($0 := sum(DISTINCT a.four)) or 
hashed SubPlan 2)

  -  Sort  (cost=197.83..200.33 rows=1000 width=8)
Output: a.ten, a.four
Sort Key: a.ten
-  Seq Scan on public.onek a  (cost=0.00..148.00 rows=1000 
width=8)

  Output: a.ten, a.four
  SubPlan 1
-  Seq Scan on public.onek b  (cost=0.00..150.50 rows=250 width=0)
  Filter: ($0 = b.four)
  SubPlan 2
-  Seq Scan on public.onek b  (cost=0.00..148.00 rows=1000 width=4)
  Output: b.four
(14 rows)

Would the explain above be better if the filter with subplans arguments 
showed the EXISTS keyword? Much code from get_sublink_expr could be 
reused to show additional info depending on the sublink type and testexpr.


Other info: there are now only appends to the context-buf. The problems 
with OUTER var I mentioned earlier are gone now arguments are deparsed 
using get_rule_expr instead of deparse_expression.


regards,
Yeb Havinga

diff --git a/src/backend/utils/adt/ruleutils.c 
b/src/backend/utils/adt/ruleutils.c
index d16f1c4..68302e4 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -206,6 +206,7 @@ static void get_const_expr(Const *constval, deparse_context 
*context,
   int showtype);
 static void simple_quote_literal(StringInfo buf, const char *val);
 static void get_sublink_expr(SubLink *sublink, deparse_context *context);
+static void get_subplan_reference(SubPlan *subplan, deparse_context *context);
 static void get_from_clause(Query *query, const char *prefix,
deparse_context *context);
 static void get_from_clause_item(Node *jtnode, Query *query,
@@ -4647,20 +4648,13 @@ get_rule_expr(Node *node, deparse_context *context,
break;
 
case T_SubPlan:
-   {
-   SubPlan*subplan = (SubPlan *) node;
-
-   /*
-* We cannot see an already-planned subplan in 
rule deparsing,
-* only while EXPLAINing a query plan.  We 
don't try to
-* reconstruct the original SQL, just reference 
the subplan
-* that appears elsewhere in EXPLAIN's result.
-*/
-   if (subplan-useHashTable)
-   appendStringInfo(buf, (hashed %s), 
subplan-plan_name);
-   else
-   appendStringInfo(buf, (%s), 
subplan-plan_name);
-   }
+/*
+* We cannot see an already-planned subplan in rule 
deparsing, only
+* while EXPLAINing a query plan. We don't try to 
reconstruct the
+* original SQL, just reference the subplan that 
appears elsewhere
+* in EXPLAIN's result.
+*/
+   get_subplan_reference((SubPlan *) node, context);
break;
 
case T_AlternativeSubPlan:
@@ -4673,12 +4667,9 @@ get_rule_expr(Node *node, deparse_context *context,
foreach(lc, asplan-subplans)
 

Re: [HACKERS] explain and PARAM_EXEC

2010-04-01 Thread Yeb Havinga




Tom Lane wrote:


  In principle it could look something like


(SubPlan N ($0 := b.oid))


but with a few parameters and a bunch of other stuff on the same line

that would get out of hand.
  

The patch I submitted to implement this, hits bogus varno:65001 in get_variable
when I explain the following query from the aggregates.sql regression
test.

select ten, sum(distinct four) from onek a
group by ten
having exists (select 1 from onek b where sum(distinct a.four) =
b.four);

The explain without the subplan argument-patch shows
regression=# explain verbose select ten, sum(distinct four) from
onek a
group by ten
having exists (select 1 from onek b where sum(distinct a.four) =
b.four);
 QUERY
PLAN 
--
GroupAggregate (cost=197.83..216.52 rows=10 width=8)
 Output: a.ten, sum(DISTINCT a.four)
 Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
 - Sort (cost=197.83..200.33 rows=1000 width=8)
 Output: a.ten, a.four
 Sort Key: a.ten
 - Seq Scan on public.onek a (cost=0.00..148.00 rows=1000
width=8)
 Output: a.ten, a.four
 SubPlan 1
 - Seq Scan on public.onek b (cost=0.00..150.50 rows=250
width=0)
 Filter: ($0 = b.four)
 SubPlan 2
 - Seq Scan on public.onek b (cost=0.00..148.00 rows=1000
width=4)
 Output: b.four
(14 rows)

The subplan argument list contains a aggref node, with a var argument
that has the 65001 varno. Apparently the OUTER varno is set by
search_indexed_tlist_for_var and it seems like correct behaviour, see
stack trace below.

I'm unsure what is a good approach to solve this problem: let
get_variable not give an error in this case?

regards,
Yeb Havinga




#0 search_indexed_tlist_for_var (var=0xe32dc0, itlist=0xe67290,
newvarno=65001, rtoffset=0)
 at setrefs.c:1568
#1 0x00697581 in fix_upper_expr_mutator (node=0xe32dc0,
context=0x7fffabcde850)
 at setrefs.c:1853
#2 0x00697529 in fix_upper_expr (glob=0xe77178, node=0xe32dc0,
subplan_itlist=0xe67290, 
 rtoffset=0) at setrefs.c:1839
#3 0x00696b6e in set_upper_references (glob=0xe77178,
plan=0xe66f80, rtoffset=0)


postgres=#
explain select oid::int + 1,(select oid from pg_class a where a.oid =
b.relfilenode and a.relnamespace = b.relnamespace) from pg_class b;
  
 QUERY
PLAN
-
  
Seq Scan on pg_class b (cost=0.00..2459.64 rows=296 width=12)
  
 SubPlan 1 ($0 := b.relfilenode, $1 := b.relnamespace)
  
 - Index Scan using pg_class_oid_index on pg_class a
(cost=0.00..8.27 rows=1 width=4)
  
 Index Cond: (oid = $0)
  
 Filter: (relnamespace = $1)
  
(5 rows)
  
  
index 56d9c5b..454d59b 100644
  
--- a/src/backend/commands/explain.c
  
+++ b/src/backend/commands/explain.c
  
@@ -1686,20 +1686,47 @@ static void
  
ExplainSubPlans(List *plans, const char *relationship, ExplainState
*es)
  
{
  
 ListCell *lst;
  
  
 foreach(lst, plans)
  
 {
  
 SubPlanState *sps = (SubPlanState *) lfirst(lst);
  
- SubPlan *sp = (SubPlan *) sps-xprstate.expr;
  
+ SubPlan *sp = (SubPlan *) sps-xprstate.expr;
  
+ StringInfo signature = makeStringInfo();
  
+ int i = 0;
  
+ List *context;
  
+ bool useprefix;
  
+ ListCell *c;
  
+
  
+ context = deparse_context_for_plan((Node
*)exec_subplan_get_plan(es-pstmt, sp),
  
+
NULL,
  
+
es-rtable,
  
+
es-pstmt-subplans);
  
+ useprefix = list_length(es-rtable)  1;
  
+
  
+ appendStringInfoString(signature, sp-plan_name);
  
+
  
+ foreach(c, sp-args)
  
+ {
  
+ Node *n = lfirst(c);
  
+ appendStringInfo(signature, "%s$%d := %s",
  
+ (i == 0) ? "
(" : ", ",
  
+ i,
  
+
deparse_expression(n, context, useprefix, true));
  
+ i++;
  
+ }
  
+
  
+ if (i  0)
  
+ appendStringInfoString(signature, ")");
  
  
 ExplainNode(exec_subplan_get_plan(es-pstmt, sp),
  
 sps-planstate,
  
 NULL,
  
- relationship, sp-plan_name,
  
+ relationship,
  
+ signature-data,
  
 es);
  
 }
  
}
  
  
/*
  
* Explain a property, such as sort keys or targets, that takes the
form of
  
* a list of unlabeled items. "data" is a list of C strings.
  
  






Re: [HACKERS] explain and PARAM_EXEC

2010-04-01 Thread Tom Lane
Yeb Havinga yebhavi...@gmail.com writes:
 !DOCTYPE html PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN
 html
 head
   meta content=text/html;charset=ISO-8859-1 http-equiv=Content-Type
 /head
 body bgcolor=#ff text=#00
 Tom Lane wrote:
 br
 blockquote cite=mid:4ba361f2.5010...@gmail.com type=cite
   blockquote type=citeIn principle it could look something like

[ etc etc ]

Please do not send HTML-only email to the lists.  It's a PITA to work with.

Anyway, I hadn't looked at your patch before, but now that I have, it's
not even approximately what I was suggesting.  What I thought you should
do was change ruleutils.c to print the parameter expressions at the call
site, ie in the T_SubPlan and T_AlternativeSubPlan cases in get_rule_expr.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain and PARAM_EXEC

2010-03-19 Thread Yeb Havinga

Tom Lane wrote:

Robert Haas robertmh...@gmail.com writes:
  

So I guess there are two issues here: (1) somehow I feel like we
should be telling the user what expression is being used to initialize
$0, $1, etc. when they are PARAM_EXEC parameters;



Maybe, but the only reasonable place to put it would be within the
(SubPlan N) reference, which is not a place where verboseness would be
appreciated, I think.  In principle it could look something like

(SubPlan N ($0 := b.oid))

but with a few parameters and a bunch of other stuff on the same line
that would get out of hand.
  
I'm currently digesting subplan/initplan handling and I really like 
Robert's idea of showing actual parameters.


postgres=# explain select oid::int + 1,(select oid from pg_class a where 
a.oid = b.relfilenode and a.relnamespace = b.relnamespace) from pg_class b;
QUERY 
PLAN 
-

Seq Scan on pg_class b  (cost=0.00..2459.64 rows=296 width=12)
  SubPlan 1 ($0 := b.relfilenode, $1 := b.relnamespace)
-  Index Scan using pg_class_oid_index on pg_class a  
(cost=0.00..8.27 rows=1 width=4)

  Index Cond: (oid = $0)
  Filter: (relnamespace = $1)
(5 rows)

Only changes in ExplainSubPlans, all regression tests passed (which 
surprized me a bit, no explains with subplanes in expected results?). 
NB: this is not a patch to HEAD but a local version, so line numbers are 
off, patch supplied for discussion purposes only.


kind regards,
Yeb Havinga

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 56d9c5b..454d59b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1686,20 +1686,47 @@ static void
ExplainSubPlans(List *plans, const char *relationship, ExplainState *es)
{
   ListCell   *lst;

   foreach(lst, plans)
   {
   SubPlanState *sps = (SubPlanState *) lfirst(lst);
-   SubPlan*sp = (SubPlan *) sps-xprstate.expr;
+   SubPlan  *sp = (SubPlan *) sps-xprstate.expr;
+   StringInfosignature = makeStringInfo();
+   int   i = 0;
+   List *context;
+   bool  useprefix;
+   ListCell *c;
+
+   context = deparse_context_for_plan((Node 
*)exec_subplan_get_plan(es-pstmt, sp),
+  
NULL,
+  
es-rtable,
+  
es-pstmt-subplans);

+   useprefix = list_length(es-rtable)  1;
+
+   appendStringInfoString(signature, sp-plan_name);
+
+   foreach(c, sp-args)
+   {
+   Node *n = lfirst(c);
+   appendStringInfo(signature, %s$%d := %s,
+(i == 0) ?  ( 
: , ,

+i,
+
deparse_expression(n, context, useprefix, true));

+   i++;
+   }
+
+   if (i  0)
+   appendStringInfoString(signature, ));

   ExplainNode(exec_subplan_get_plan(es-pstmt, sp),
   sps-planstate,
   NULL,
-   relationship, sp-plan_name,
+   relationship,
+   signature-data,
   es);
   }
}

/*
 * Explain a property, such as sort keys or targets, that takes the form of
 * a list of unlabeled items.  data is a list of C strings.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain and PARAM_EXEC

2010-02-20 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 It's really not much different from a function call with subplans as
 functions.  The PARAM_EXEC stuff looks just like 1950's era
 non-reentrant function parameter passing mechanisms, back before anybody
 had thought of recursive functions and they passed a function's
 parameters in fixed storage locations.  It's okay for this because
 subplan trees are never recursive ...

hand waving alert

How much does this stuff is dependent on the current state of the
backend? If not that much, maybe the planner could consider costs of
having another backend started to handle the subplan. We'd need a
tuplestore or some other place to exchange results (a unix style pipe
maybe, but we do support windows as a platform), and a special executor
mode for running a subplan, maybe more than once. Oh, and a way to share
the same snapshot in more than one backend too, but that's being worked
on I think.

Ok that's a far stretch from the question at hand, but would that be a
plausible approach to have parallel queries in PostgreSQL ?

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain and PARAM_EXEC

2010-02-20 Thread Greg Stark
On Sat, Feb 20, 2010 at 4:33 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 It's really not much different from a function call with subplans as
 functions.

Perhaps it would be clearer to display the (Subplan 1) in a function
call style format like Subplan1(b.oid)

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain and PARAM_EXEC

2010-02-20 Thread Robert Haas
On Fri, Feb 19, 2010 at 11:58 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Feb 19, 2010 at 11:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 We can also fetch that tuple's
 relfilenode and pass it to the subplan, which we do by setting the $0
 Param value before invoking the subplan.

 Are the same tuples in scope when evaluating the expression that sets
 $0 as were in scope when evaluating  ((b.oid)::integer + 1)?

 Yes, exactly the same.  The parameter-value expression is just like any
 other scalar expression that could appear where the SubPlan reference
 is.  It doesn't know anything about the subplan, really.

OK, thanks for the explanation. That makes more sense now.  I've got
to go study this some more before I ask my next question...

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain and PARAM_EXEC

2010-02-20 Thread Robert Haas
On Sat, Feb 20, 2010 at 6:57 AM, Dimitri Fontaine
dfonta...@hi-media.com wrote:
 Tom Lane t...@sss.pgh.pa.us writes:
 It's really not much different from a function call with subplans as
 functions.  The PARAM_EXEC stuff looks just like 1950's era
 non-reentrant function parameter passing mechanisms, back before anybody
 had thought of recursive functions and they passed a function's
 parameters in fixed storage locations.  It's okay for this because
 subplan trees are never recursive ...

 hand waving alert

 How much does this stuff is dependent on the current state of the
 backend?

A whole lot.

 Ok that's a far stretch from the question at hand, but would that be a
 plausible approach to have parallel queries in PostgreSQL ?

This is really a topic for another thread, but at 100,000 feet it
seems to me that the hardest question is - how will you decide which
operations to parallelize in the first place?  Actually making it
happen is really hard, too, of course, but even to get that that point
you have to have some model for what types of operations it makes
sense to parallelize and how you're going to decide when it's a win.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain and PARAM_EXEC

2010-02-20 Thread Robert Haas
On Sat, Feb 20, 2010 at 7:53 AM, Greg Stark gsst...@mit.edu wrote:
 On Sat, Feb 20, 2010 at 4:33 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 It's really not much different from a function call with subplans as
 functions.

 Perhaps it would be clearer to display the (Subplan 1) in a function
 call style format like Subplan1(b.oid)

I thought about that, too...  maybe for 9.1 we should consider it.  It
might be nice to add some sort of glyph to make the user less likely
to think that Subplan1 is in fact a function. Subplan1(b.oid)?
${Subplan1}(b.oid)?  I dunno.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


parallelizing subplan execution (was: [HACKERS] explain and PARAM_EXEC)

2010-02-20 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 On Sat, Feb 20, 2010 at 6:57 AM, Dimitri Fontaine
 dfonta...@hi-media.com wrote:
 How much does this stuff is dependent on the current state of the
 backend?

 A whole lot.

Bad news.

 Ok that's a far stretch from the question at hand, but would that be a
 plausible approach to have parallel queries in PostgreSQL ?

 This is really a topic for another thread, but at 100,000 feet it
 seems to me that the hardest question is - how will you decide which
 operations to parallelize in the first place?  Actually making it
 happen is really hard, too, of course, but even to get that that point
 you have to have some model for what types of operations it makes
 sense to parallelize and how you're going to decide when it's a win.

My naive thoughts would be to add some cost parameters. The fact to
fork() another backend first, then model for each supported subplan (we
will want to add more, or maybe have a special rendez-vous-materialise
node) some idea of the data exchange cost.

Now the planner would as usual try to find the less costly plan, and
will be able to compare plans with and without distributing the work.

Overly naive ?

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain and PARAM_EXEC

2010-02-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, Feb 20, 2010 at 6:57 AM, Dimitri Fontaine
 Ok that's a far stretch from the question at hand, but would that be a
 plausible approach to have parallel queries in PostgreSQL ?

 This is really a topic for another thread, but at 100,000 feet it
 seems to me that the hardest question is - how will you decide which
 operations to parallelize in the first place?  Actually making it
 happen is really hard, too, of course, but even to get that that point
 you have to have some model for what types of operations it makes
 sense to parallelize and how you're going to decide when it's a win.

Indeed, and if I were setting out to parallelize queries in PG (which
I am not), subplans would be the last thing I would think about.  You
could put an enormous amount of work in there and have nothing much
to show for it, because the construct doesn't even arise in many
queries.  Even where the user wrote something that looks like a subplan,
the planner will do its best to get rid of it by turning it into a join.

So if you want to parallelize queries, start someplace else.  The past
discussions of this have revolved around splitting the node tree of an
ordinary query plan into separately executable parts.  Maybe a subplan
could be one of the cut points for such an approach, but if it's the
only one or even the main one, you're wasting your time.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain and PARAM_EXEC

2010-02-20 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, Feb 20, 2010 at 7:53 AM, Greg Stark gsst...@mit.edu wrote:
 Perhaps it would be clearer to display the (Subplan 1) in a function
 call style format like Subplan1(b.oid)

 I thought about that, too...  maybe for 9.1 we should consider it.  It
 might be nice to add some sort of glyph to make the user less likely
 to think that Subplan1 is in fact a function. Subplan1(b.oid)?
 ${Subplan1}(b.oid)?  I dunno.

You really can't escape the need to identify which $N symbol is
associated with which parameter value.  As soon as you've got more than
one subplan in a query, that becomes a nontrivial thing for a user to
guess.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain and PARAM_EXEC

2010-02-20 Thread Jaime Casanova
On Fri, Feb 19, 2010 at 10:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 So I guess there are two issues here: (1) somehow I feel like we
 should be telling the user what expression is being used to initialize
 $0, $1, etc. when they are PARAM_EXEC parameters;

 Maybe, but the only reasonable place to put it would be within the
 (SubPlan N) reference, which is not a place where verboseness would be
 appreciated, I think.  In principle it could look something like

        (SubPlan N ($0 := b.oid))


what if we put fully qualified names every time we use a reference
from a subplan?
something like:

  QUERY PLAN

 Seq Scan on pg_catalog.pg_class b  (cost=0.00..2250.22 rows=271 width=4)
  Output: (SubPlan 1)
  SubPlan 1
-  Index Scan using pg_class_oid_index on pg_catalog.pg_class a
(cost=0.00..8.27 rows=1 width=4)
  Output: a.oid
  Index Cond: (a.oid = subplan1.$0)

-- 
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain and PARAM_EXEC

2010-02-20 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 Indeed, and if I were setting out to parallelize queries in PG (which
 I am not), subplans would be the last thing I would think about.  You
 could put an enormous amount of work in there and have nothing much
 to show for it, because the construct doesn't even arise in many
 queries.  Even where the user wrote something that looks like a subplan,
 the planner will do its best to get rid of it by turning it into a
 join.

I guess that's because subplans do cost more than their alternative. The
idea was to provide a parallel implementation of them, so they get some
benefits, then compare better to plain join'ing.

But I can see that's an entirely wrong approach, and I'm happy to know
that and glad I asked, thanks :)

 So if you want to parallelize queries, start someplace else.  The past
 discussions of this have revolved around splitting the node tree of an
 ordinary query plan into separately executable parts.  Maybe a subplan
 could be one of the cut points for such an approach, but if it's the
 only one or even the main one, you're wasting your time.

Unless you arrange for the planner to have good (new) reasons to prefer
using subplans, or provide subplan based joins ?

Ok, once you've done that, maybe you're back to the main problem and
just changed its name.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: parallelizing subplan execution (was: [HACKERS] explain and PARAM_EXEC)

2010-02-20 Thread Robert Haas
On Sat, Feb 20, 2010 at 8:31 AM, Dimitri Fontaine
dfonta...@hi-media.com wrote:
 This is really a topic for another thread, but at 100,000 feet it
 seems to me that the hardest question is - how will you decide which
 operations to parallelize in the first place?  Actually making it
 happen is really hard, too, of course, but even to get that that point
 you have to have some model for what types of operations it makes
 sense to parallelize and how you're going to decide when it's a win.

 My naive thoughts would be to add some cost parameters. The fact to
 fork() another backend first, then model for each supported subplan (we
 will want to add more, or maybe have a special rendez-vous-materialise
 node) some idea of the data exchange cost.

 Now the planner would as usual try to find the less costly plan, and
 will be able to compare plans with and without distributing the work.

 Overly naive ?

Probably.  For one thing, you can't use fork(), because it won't work
on Windows.

It seems to me that you need to start by thinking about what kinds of
queries could be usefully parallelized.  What I think you're proposing
here, modulo large amounts of hand-waving, is that we should basically
find a branch of the query tree, cut it off, and make that branch the
responsibility of a subprocess.  What kinds of things would be
sensible to hand off in this way?  Well, you'd want to find nodes that
are not likely to be repeatedly re-executed with different parameters,
like subplans or inner-indexscans, because otherwise you'll get
pipeline stalls handing the new parameters back and forth.  And you
want to find nodes that are expensive for the same reason.  So maybe
this would work for something like a merge join on top of two sorts -
one backend could perform each sort, and then whichever one was the
child would stream the tuples to the parent for the final merge.  Of
course, this assumes the I/O subsystem can keep up, which is not a
given - if both tables are fed by the same, single spindle, it might
be worse than if you just did the sorts consecutively.

This approach might also benefit queries that are very CPU-intensive,
on a multi-core system with spare cycles.  Suppose you have a big tall
stack of hash joins, each with a small inner rel.  The child process
does about half the joins and then pipelines the results into the
parent, which does the other half and returns the results.

But there's at least one other totally different way of thinking about
this problem, which is that you might want two processes to cooperate
in executing the SAME query node - imagine, for example, a big
sequential scan with an expensive but highly selective filter
condition, or an enormous sort.  You have all the same problems of
figuring out when it's actually going to help, of course, but the
details will likely be quite different.

I'm not really sure which one of these would be more useful in
practice - or maybe there are even other strategies.  What does
$COMPETITOR do?

I'm also ignoring the difficulties of getting hold of a second backend
in the right state - same database, same snapshot, etc.  It seems to
me unlikely that there are a substantial number of real-world
applications for which this will not work very well if we have to
actually start a new backend every time we want to parallelize a
query.  IOW, we're going to need, well, a connection pool in core.
*ducks, runs for cover*

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] explain and PARAM_EXEC

2010-02-19 Thread Robert Haas
Consider the following (rather lame) query:

rhaas=# explain (verbose) select (select oid from pg_class a where
a.oid = b.relfilenode) from pg_class b;
   QUERY PLAN

 Seq Scan on pg_catalog.pg_class b  (cost=0.00..2250.22 rows=271 width=4)
   Output: (SubPlan 1)
   SubPlan 1
 -  Index Scan using pg_class_oid_index on pg_catalog.pg_class a
(cost=0.00..8.27 rows=1 width=4)
   Output: a.oid
   Index Cond: (a.oid = $0)

It seems odd to me that we don't display any information about where
$0 comes from or how it's initialized.  Should we?  I believe what's
happening is that the sequential scan of b kicks out b.oid, and that
then gets yanked into $0 when we invoke the subplan.  But you can't
really see what's happening.  Interestingly, if you contrive to make
the sequential scan not the toplevel plan node, then you actually do
get to see what it's kicking out:

rhaas=# explain (verbose) select (select oid from pg_class a where
a.oid = b.oid::integer) from pg_class b, generate_series(1,5);
   QUERY PLAN

 Nested Loop  (cost=0.00..2245943.89 rows=271000 width=4)
   Output: (SubPlan 1)
   -  Seq Scan on pg_catalog.pg_class b  (cost=0.00..9.71 rows=271 width=4)
 Output: b.oid
   -  Function Scan on pg_catalog.generate_series  (cost=0.00..10.00
rows=1000 width=0)
 Output: generate_series.generate_series
   SubPlan 1
 -  Index Scan using pg_class_oid_index on pg_catalog.pg_class a
(cost=0.00..8.27 rows=1 width=4)
   Output: a.oid
   Index Cond: (a.oid = ($0)::oid)
(10 rows)

We can even make it kick out two things:

rhaas=# explain (verbose) select (select oid from pg_class a where
a.oid = b.oid::integer + b.relfilenode::integer) from pg_class b,
generate_series(1,5);
   QUERY PLAN

 Nested Loop  (cost=0.00..2246621.39 rows=271000 width=8)
   Output: (SubPlan 1)
   -  Seq Scan on pg_catalog.pg_class b  (cost=0.00..9.71 rows=271 width=8)
 Output: b.oid, b.relfilenode
   -  Function Scan on pg_catalog.generate_series  (cost=0.00..10.00
rows=1000 width=0)
 Output: generate_series.generate_series
   SubPlan 1
 -  Index Scan using pg_class_oid_index on pg_catalog.pg_class a
(cost=0.00..8.27 rows=1 width=4)
   Output: a.oid
   Index Cond: (a.oid = ((($0)::integer + ($1)::integer))::oid)
(10 rows)

But if we drop the generate_series call we're back in the dark -
where's the node that's emitting oid and relfilenode?

rhaas=# explain (verbose) select (select oid from pg_class a where
a.oid = b.oid::integer + b.relfilenode::integer) from pg_class b;
   QUERY PLAN

 Seq Scan on pg_catalog.pg_class b  (cost=0.00..2250.90 rows=271 width=8)
   Output: (SubPlan 1)
   SubPlan 1
 -  Index Scan using pg_class_oid_index on pg_catalog.pg_class a
(cost=0.00..8.27 rows=1 width=4)
   Output: a.oid
   Index Cond: (a.oid = ((($0)::integer + ($1)::integer))::oid)
(6 rows)

So I guess there are two issues here: (1) somehow I feel like we
should be telling the user what expression is being used to initialize
$0, $1, etc. when they are PARAM_EXEC parameters; and (2) where does
the output list for the sequential scan go when there's only one
table involved?

This is when you all start explaining to me why I'm woefully confused...

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain and PARAM_EXEC

2010-02-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 So I guess there are two issues here: (1) somehow I feel like we
 should be telling the user what expression is being used to initialize
 $0, $1, etc. when they are PARAM_EXEC parameters;

Maybe, but the only reasonable place to put it would be within the
(SubPlan N) reference, which is not a place where verboseness would be
appreciated, I think.  In principle it could look something like

(SubPlan N ($0 := b.oid))

but with a few parameters and a bunch of other stuff on the same line
that would get out of hand.

 and (2) where does
 the output list for the sequential scan go when there's only one
 table involved?

The (SubPlan N) reference is meant to be understood as an expression
element yielding the output of the subplan.  One of us is confused,
because I don't see any material difference between your examples;
they're all calling the subplan in the same way.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain and PARAM_EXEC

2010-02-19 Thread Robert Haas
On Fri, Feb 19, 2010 at 10:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 So I guess there are two issues here: (1) somehow I feel like we
 should be telling the user what expression is being used to initialize
 $0, $1, etc. when they are PARAM_EXEC parameters;

 Maybe, but the only reasonable place to put it would be within the
 (SubPlan N) reference, which is not a place where verboseness would be
 appreciated, I think.  In principle it could look something like

        (SubPlan N ($0 := b.oid))

 but with a few parameters and a bunch of other stuff on the same line
 that would get out of hand.

I thought maybe it could do something like this:

SubPlan 1
  Parameters: $0 := b.oid
  - Index Scan etc.

...but maybe that doesn't work if it can be called with different
parameters from different places?  Not sure if that's possible.

 and (2) where does
 the output list for the sequential scan go when there's only one
 table involved?

 The (SubPlan N) reference is meant to be understood as an expression
 element yielding the output of the subplan.  One of us is confused,
 because I don't see any material difference between your examples;
 they're all calling the subplan in the same way.

I don't think it's a stretch to say that I'm the one who is confused.
I am under the (perhaps faulty) impression that when evaluating an
expression there can only ever be three tuples in score: inner, outer,
and scan.  So when we go to evaluate the expression whose result will
be assigned to $0, where do we get those inner and/or outer and/or
scan tuples from?  IOW, I understand where the subplan is putting its
OUTPUT, what I don't understand is what context is being used to set
its input parameters.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain and PARAM_EXEC

2010-02-19 Thread Robert Haas
On Fri, Feb 19, 2010 at 11:05 PM, Robert Haas robertmh...@gmail.com wrote:
 I don't think it's a stretch to say that I'm the one who is confused.
 I am under the (perhaps faulty) impression that when evaluating an
 expression there can only ever be three tuples in score: inner, outer,

s/score/scope.

 and scan.  So when we go to evaluate the expression whose result will
 be assigned to $0, where do we get those inner and/or outer and/or
 scan tuples from?  IOW, I understand where the subplan is putting its
 OUTPUT, what I don't understand is what context is being used to set
 its input parameters.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain and PARAM_EXEC

2010-02-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Feb 19, 2010 at 10:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Maybe, but the only reasonable place to put it would be within the
 (SubPlan N) reference,

 I thought maybe it could do something like this:

 SubPlan 1
   Parameters: $0 := b.oid
   - Index Scan etc.

No, that's the wrong end of the stick --- that's like trying to annotate
a function definition with the actual parameter values being passed to
it from somewhere else.  You haven't got the info there, and even if you
did, it's assuming that there is exactly one call site for any subplan.

 I am under the (perhaps faulty) impression that when evaluating an
 expression there can only ever be three tuples in score: inner, outer,
 and scan.  So when we go to evaluate the expression whose result will
 be assigned to $0, where do we get those inner and/or outer and/or
 scan tuples from?  IOW, I understand where the subplan is putting its
 OUTPUT, what I don't understand is what context is being used to set
 its input parameters.

Consider this small mod on your example:

regression=# explain (verbose) select oid::int + 1,(select oid from pg_class a 
where a.oid = b.relfilenode) from pg_class b;
   QUERY PLAN   

 Seq Scan on pg_catalog.pg_class b  (cost=0.00..5573.04 rows=671 width=8)
   Output: ((b.oid)::integer + 1), (SubPlan 1)
   SubPlan 1
 -  Index Scan using pg_class_oid_index on pg_catalog.pg_class a  
(cost=0.00..8.27 rows=1 width=4)
   Output: a.oid
   Index Cond: (a.oid = $0)
(6 rows)

When we are evaluating the output targetlist of the seqscan node, we
have a scan tuple of pg_class b in scope.  We can fetch that tuple's
oid and use it in the first expression.  We can also fetch that tuple's
relfilenode and pass it to the subplan, which we do by setting the $0
Param value before invoking the subplan.  The subplan runs an indexscan
and returns a single scalar value (to wit, a.oid from some row of
pg_class a), which becomes the value of the (SubPlan 1) reference
back at the evaluation of the seqscan's targetlist.

It's really not much different from a function call with subplans as
functions.  The PARAM_EXEC stuff looks just like 1950's era
non-reentrant function parameter passing mechanisms, back before anybody
had thought of recursive functions and they passed a function's
parameters in fixed storage locations.  It's okay for this because
subplan trees are never recursive ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain and PARAM_EXEC

2010-02-19 Thread Robert Haas
On Fri, Feb 19, 2010 at 11:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Feb 19, 2010 at 10:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Maybe, but the only reasonable place to put it would be within the
 (SubPlan N) reference,

 I thought maybe it could do something like this:

 SubPlan 1
   Parameters: $0 := b.oid
   - Index Scan etc.

 No, that's the wrong end of the stick --- that's like trying to annotate
 a function definition with the actual parameter values being passed to
 it from somewhere else.  You haven't got the info there, and even if you
 did, it's assuming that there is exactly one call site for any subplan.

OK.  Will have to think this one over.

 I am under the (perhaps faulty) impression that when evaluating an
 expression there can only ever be three tuples in score: inner, outer,
 and scan.  So when we go to evaluate the expression whose result will
 be assigned to $0, where do we get those inner and/or outer and/or
 scan tuples from?  IOW, I understand where the subplan is putting its
 OUTPUT, what I don't understand is what context is being used to set
 its input parameters.

 Consider this small mod on your example:

 regression=# explain (verbose) select oid::int + 1,(select oid from pg_class 
 a where a.oid = b.relfilenode) from pg_class b;
                                               QUERY PLAN
 
  Seq Scan on pg_catalog.pg_class b  (cost=0.00..5573.04 rows=671 width=8)
   Output: ((b.oid)::integer + 1), (SubPlan 1)
   SubPlan 1
     -  Index Scan using pg_class_oid_index on pg_catalog.pg_class a  
 (cost=0.00..8.27 rows=1 width=4)
           Output: a.oid
           Index Cond: (a.oid = $0)
 (6 rows)

 When we are evaluating the output targetlist of the seqscan node, we
 have a scan tuple of pg_class b in scope.  We can fetch that tuple's
 oid and use it in the first expression.

OK.

 We can also fetch that tuple's
 relfilenode and pass it to the subplan, which we do by setting the $0
 Param value before invoking the subplan.

Are the same tuples in scope when evaluating the expression that sets
$0 as were in scope when evaluating  ((b.oid)::integer + 1)?

 The subplan runs an indexscan
 and returns a single scalar value (to wit, a.oid from some row of
 pg_class a), which becomes the value of the (SubPlan 1) reference
 back at the evaluation of the seqscan's targetlist.

This part I get, 100%.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain and PARAM_EXEC

2010-02-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Feb 19, 2010 at 11:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 We can also fetch that tuple's
 relfilenode and pass it to the subplan, which we do by setting the $0
 Param value before invoking the subplan.

 Are the same tuples in scope when evaluating the expression that sets
 $0 as were in scope when evaluating  ((b.oid)::integer + 1)?

Yes, exactly the same.  The parameter-value expression is just like any
other scalar expression that could appear where the SubPlan reference
is.  It doesn't know anything about the subplan, really.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers