Re: [HACKERS] Recursive query syntax ambiguity

2007-02-05 Thread Martijn van Oosterhout
On Mon, Jan 29, 2007 at 01:38:02PM +, Gregory Stark wrote:
 Instead I suggest we create one type of reentrant node, which memoizes its
 output. It would be a kind of on-demand Materialize. It would be paired with a
 second node type which would be the only type of node which can invoke it.
 This RecursiveCall node would invoke the Memoize node using a special
 interface in which it passes enough state for the Memoize node to seek to the
 correct place in its output.

That I beleive is the right approach. I think an equivalent way of
looking at it is a Loop node with an InitPlan and a StepPlan.

Initially, the Loop node executes the InitPlan to get it's initial set
of tuples, storing them like a Materialize node does. After that it
keeps calling the StepNode, where somewhere inside the it has a node
that extracts a row from the aforementioned tuplestore. It stores these
returned tuples in the tuplestore also, thus giving you recursion.

snip

 (I've convinced myself that that's true but I should probably work out
 a good proof of it before I make all this depend on it.)

Yeah, proving it is going to be tricky, I'm not sure what the standard
says about infinite recursion.

 There are three general cases of the Memoize node. Breadth-first, Depth-first,
 and non-linearly-recursive.

I think the the only difference between depth and bredth-first searches
is (if you consider the tuplesort to be a queue) whether the new tuples
go to the front or the back of the list. But a data structures and
algorithms book will know this.

 There are a few open issues to consider. Namely, how to cost a RecursiveCall
 node.

One note: if you know that if you get p tuples out for every tuple in
(where p1) then the asymptotic result of 1 + p + p*p+ ... is 1/(1-p)

However, I don't know it matters. You only need to cost the plan if
there are alternate paths and given the plan structure is strongly
constrained, I'm not sure how much it matters.

 Also, if a subplan has exactly one call-site we really ought to inline it as
 it will get much more reliable plans. Similarly if there are two call sites we
 should consider inlining it if the cost of materializing the results (and
 reading them back) is more than n-call-sites x the cost of executing the
 query. I would expect That would happen with plain sequential scans for
 example.

In the case where the subplan has side-effects, you can't optimise at
all. In the case of read-committed mode, will two seq-scans always
return the same result?

Hope this helps,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Recursive query syntax ambiguity

2007-02-05 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 However, I don't know it matters. You only need to cost the plan if
 there are alternate paths and given the plan structure is strongly
 constrained, I'm not sure how much it matters.

It does, since the whole thing could be a subquery, in which case there
could be options available at the outer level.  I doubt we'll be able to
be really smart, but that doesn't mean we can just punt.

 In the case of read-committed mode, will two seq-scans always
 return the same result?

They definitely should, since we'll be using the same snapshot
throughout the query.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Recursive query syntax ambiguity

2007-01-29 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Gregory Stark [EMAIL PROTECTED] writes:
 Having fixed that everything works fine with SET and WITH being reserved
 keywords. You didn't mean to say I should be able to leave WITH unreserved 
 did
 you?

 I think we'd decided that was a lost cause, unless you see a way?

No, I decided it was a lost cause right at the start of this. I was just
double checking that you didn't think otherwise when you said you didn't see
anything else that needed to be reserved.

 Of course that was the easy part...

 Yup ;-)


I think I've finally wrapped my head around the idea of mutually recursive and
non-linearly recursive queries. Good thing too since in thinking about them I
realized that the approach I had intended to use for plain non-recursive
queries wasn't really adequate and needs the same machinery.

The fundamental feature Postgres needs to implement this that it doesn't have
is the ability to be running the same subplan from two different contexts in
the plan simultaneously. That is, the ability to start a scan from the
beginning without disturbing a scan already in progress for that same node.

In retrospect this isn't terribly surprising since it's the same
characteristic functions need in a programming language to be safely callable
recursively. They need to be reentrant.

The normal way to make functions reentrant is to remove all local state and
make the caller responsible for providing space to allocate memory for
temporary state. But teaching every node type about passing its state up to
the parent including bundling up all the state of its children seems like too
much work. Both for me and the computer :)

In any case that would result in an n^2 algorithm for recursive queries as it
would force the executor to re-execute each node for the same record over and
over. Much as the obvious recursive fibonacci algorithm is n^2 for example.
It also would mean making non-recursive WITH clauses pointless since they
would still be executed once per call-site.

Instead I suggest we create one type of reentrant node, which memoizes its
output. It would be a kind of on-demand Materialize. It would be paired with a
second node type which would be the only type of node which can invoke it.
This RecursiveCall node would invoke the Memoize node using a special
interface in which it passes enough state for the Memoize node to seek to the
correct place in its output.

Because it memoizes its output it should never actually need to be able to
handle recursive calls. When the call depth is maximum each call site would be
able to make at most 1 call to the Memoize node. A second one would
necessarily have been stopped higher up the call chain by the memoization
table. (I've convinced myself that that's true but I should probably work out
a good proof of it before I make all this depend on it.)

There are three general cases of the Memoize node. Breadth-first, Depth-first,
and non-linearly-recursive. 

In the case of breadth-first search we would want to keep two tuplestore
around, one for the current generation, and one for its parent. Once we finish
a generation we can free its parent since we know a linearly-recursive scan
won't be needing it any more.

In the case of depth-first we really want to memoize into a stack. I'm not
sure how to fit that into our current model of tuplestores. At worst though,
we just allocate a tuplestore for each generation as it appears. We can still
throw away old generations once their scans are finished, it just won't be
until quite late.

Non-linearly-recursive searches are the same case as non-recursive queries.
Memoize would just allocate one tuplestore and store tuples blindly without
any idea of generation. It could arrange to throw away old tuples whenever
all call-sites' scans have past them. That's the same optimization Simon wants
to make in the merge-join case when the mark is moved forward.

There are a few open issues to consider. Namely, how to cost a RecursiveCall
node.

Also, if a subplan has exactly one call-site we really ought to inline it as
it will get much more reliable plans. Similarly if there are two call sites we
should consider inlining it if the cost of materializing the results (and
reading them back) is more than n-call-sites x the cost of executing the
query. I would expect That would happen with plain sequential scans for
example.

Note that we need the Memoize/RecursiveCall nodes even for non-recursive
queries. Except in the simplest cases where each common table expression only
has a single call site (in which case we could have just inlined them anyways)
we need some way to retain state and avoid reexecuting the plan multiple
times.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Recursive query syntax ambiguity

2007-01-26 Thread Gregory Stark

Hm, I had hoped that the DB2/ANSI syntax would only require making WITH a
fully reserved word, and not the other tokens it uses. Certainly for
non-recursive queries that's the case as the only other token it uses is AS
which is already a fully reserved word.

However to fully support the DB2/ANSI syntax we would definitely have an
ambiguity and I think we would have to make CYCLE a fully reserved word
which seems like a much bigger concession than WITH. Observe the following
case:

  WITH RECURSIVE foo (x,y) AS (select 1,2) SEARCH DEPTH FIRST BY x CYCLE x,y 
SET ...

The parser can't search arbitrarily far checking for a SET to see if the CYCLE
is a keyword or a binary operator. Even if it could things like this would be
entirely ambiguous:

  WITH RECURSIVE foo (x,y) AS (select 1,2) SEARCH DEPTH FIRST BY x CYCLE x, y 
CYCLE y SET ...

I'm nowhere near actually implementing this functionality yet so there's no
pressing need for action. In fact I think the search clause is actually an
ANSIism that isn't supported by DB2 itself yet either.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] Recursive query syntax ambiguity

2007-01-26 Thread Gregory Stark

Woah, I just realized it's much worse than that. I think the syntax in the
ANSI is not parsable in LALR(1) at all. Note the following:

WITH RECURSIVE foo (a,b) AS (subq) SEARCH BREADTH FIRST BY a,b,c(x,z),d(y,z) AS 
(subq) SELECT ...

To determine whether c is the name of a new with list element it has to
scan as far ahead as the , before the d. Note that d here is in fact not
part of the search clause at all, it's the name of a second with list
element.

bleagh.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Recursive query syntax ambiguity

2007-01-26 Thread Andrew Dunstan

Gregory Stark wrote:

Woah, I just realized it's much worse than that. I think the syntax in the
ANSI is not parsable in LALR(1) at all. Note the following:

WITH RECURSIVE foo (a,b) AS (subq) SEARCH BREADTH FIRST BY a,b,c(x,z),d(y,z) AS 
(subq) SELECT ...

To determine whether c is the name of a new with list element it has to
scan as far ahead as the , before the d. Note that d here is in fact not
part of the search clause at all, it's the name of a second with list
element.

bleagh.

  


Can you post the rules you have so far that you're playing around with? 
(Also maybe the rules from the standard - I don't have a copy handy).


cheers

andrew

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Recursive query syntax ambiguity

2007-01-26 Thread Gregory Stark

Andrew Dunstan [EMAIL PROTECTED] writes:

 Can you post the rules you have so far that you're playing around with? (Also
 maybe the rules from the standard - I don't have a copy handy).

This is the best compromise I've come up with so far. It makes CYCLE a
reserved word and requires a CYCLE clause if there's a SEARCH clause.

--- gram.y  09 Jan 2007 02:14:14 +  2.573
+++ gram.y  26 Jan 2007 20:02:21 +  
@@ -350,6 +350,10 @@
 %type node   xml_root_version opt_xml_root_standalone
 %type booleandocument_or_content xml_whitespace_option
 
+%type node   common_table_expression
+%type list   with_cte_list cte_list
+%type boolean recursive_is_depth_first
+
 
 /*
  * If you make any token changes, update the keyword table in
@@ -364,7 +368,7 @@
ASSERTION ASSIGNMENT ASYMMETRIC AT AUTHORIZATION
 
BACKWARD BEFORE BEGIN_P BETWEEN BIGINT BINARY BIT
-   BOOLEAN_P BOTH BY
+   BOOLEAN_P BOTH BREADTH BY
 
CACHE CALLED CASCADE CASCADED CASE CAST CHAIN CHAR_P
CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE
@@ -376,7 +380,7 @@
 
DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS
-   DESC DISABLE_P DISTINCT DO DOCUMENT_P DOMAIN_P DOUBLE_P DROP
+   DEPTH DESC DISABLE_P DISTINCT DO DOCUMENT_P DOMAIN_P DOUBLE_P DROP
 
EACH ELSE ENABLE_P ENCODING ENCRYPTED END_P ESCAPE EXCEPT EXCLUDING
EXCLUSIVE EXECUTE EXISTS EXPLAIN EXTERNAL EXTRACT
@@ -416,11 +420,11 @@
 
QUOTE
 
-   READ REAL REASSIGN RECHECK REFERENCES REINDEX RELATIVE_P RELEASE RENAME
+   READ REAL REASSIGN RECHECK RECURSIVE REFERENCES REINDEX RELATIVE_P 
RELEASE RENAME
REPEATABLE REPLACE RESET RESTART RESTRICT RETURNING RETURNS REVOKE RIGHT
ROLE ROLLBACK ROW ROWS RULE
 
-   SAVEPOINT SCHEMA SCROLL SECOND_P SECURITY SELECT SEQUENCE
+   SAVEPOINT SCHEMA SCROLL SEARCH SECOND_P SECURITY SELECT SEQUENCE
SERIALIZABLE SESSION SESSION_USER SET SETOF SHARE
SHOW SIMILAR SIMPLE SMALLINT SOME STABLE STANDALONE_P START STATEMENT
STATISTICS STDIN STDOUT STORAGE STRICT_P STRIP_P SUBSTRING SUPERUSER_P
@@ -5681,6 +5685,25 @@

list_nth($3, 0), list_nth($3, 1));
$$ = $1;
}
+   | with_cte_list simple_select   
{ $$ = $2; }
+   | with_cte_list select_clause sort_clause
+   {
+   insertSelectOptions((SelectStmt *) $2, 
$3, NIL,
+   
NULL, NULL);
+   $$ = $2;
+   }
+   | with_cte_list select_clause opt_sort_clause 
for_locking_clause opt_select_limit
+   {
+   insertSelectOptions((SelectStmt *) $2, 
$3, $4,
+   
list_nth($5, 0), list_nth($5, 1));
+   $$ = $2;
+   }
+   | with_cte_list select_clause opt_sort_clause 
select_limit opt_for_locking_clause
+   {
+   insertSelectOptions((SelectStmt *) $2, 
$3, $5,
+   
list_nth($4, 0), list_nth($4, 1));
+   $$ = $2;
+   }
;
 
 select_clause:
@@ -5742,6 +5765,72 @@
}
;
 
+/*
+ * ANSI standard WITH clause looks like:
+ * WITH [ RECURSIVE ] query name [ (column,...) ] AS (query) [SEARCH or 
CYCLE clause] 
+ * 
+ * It seems with_cte_list has to be a separate token or else there's a s/r
+ * conflict between RECURSIVE and the cte name. This means we'll need a silly
+ * node just to hold the list and the recursive flag :( it doesn't seem like
+ * making RECURSIVE a fully reserved word would be very nice as it's probably a
+ * common column name.
+ *
+ * For now we don't support recursive so just ignore it and pass up the list
+ *
+ */
+with_cte_list:
+ WITH cte_list
+   { 
+   $$ = $2; 
+   }
+  | WITH RECURSIVE cte_list 
+   { 
+   elog(WARNING, WITH RECURSIVE not supported 
yet);  
+   $$ = $3; 
+   }
+ ;
+
+cte_list:
+ common_table_expression   
{ $$ = list_make1($1); }
+   | 

Re: [HACKERS] Recursive query syntax ambiguity

2007-01-26 Thread Martijn van Oosterhout
On Fri, Jan 26, 2007 at 05:10:01PM +, Gregory Stark wrote:
 However to fully support the DB2/ANSI syntax we would definitely have an
 ambiguity and I think we would have to make CYCLE a fully reserved word
 which seems like a much bigger concession than WITH. Observe the following
 case:
 
   WITH RECURSIVE foo (x,y) AS (select 1,2) SEARCH DEPTH FIRST BY x CYCLE x,y 
 SET ...
 
 The parser can't search arbitrarily far checking for a SET to see if the CYCLE
 is a keyword or a binary operator.

Er, CYCLE isn't a binary operator, and users can't make binary
operators that are words, so I'm not sure of the problem here.
I think the parser can tell that the expression ends at the word
cycle.

Or am I missing obvious?

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Recursive query syntax ambiguity

2007-01-26 Thread Martijn van Oosterhout
Ok, looking at your example:

WITH RECURSIVE foo (a,b) AS (subq) SEARCH BREADTH FIRST BY a,b , c(x,z),d(y,z) 
AS (subq) SELECT ...

What you're trying to say is that the c is a with list element,
not a cycle column. But the parser will see that as soon as it hits
the open parenthesis, since a cycle column is always just a column
name.

Also, the AS is the with list element doesn't appear to be optional,
I assume you left that out after the c(x,z) for clarity.

I think bison should be able to handle this as long as the name in
common_table_expression matches exactly the same things as whatever
columnList uses. It can the merge the two parse paths, allowing it to
see further.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Recursive query syntax ambiguity

2007-01-26 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 Er, CYCLE isn't a binary operator, and users can't make binary
 operators that are words, so I'm not sure of the problem here.

Well, the problem typically is not being able to tell whether an
operator is supposed to be infix or postfix; hence keywords that can
terminate arbitrary expressions usually have to be reserved words.
However, now that I look at the syntax I think Greg may be misreading
it.  I see

search or cycle clause ::=
  search clause
| cycle clause
| search clause cycle clause

search clause ::=
SEARCH recursive search order SET sequence column

recursive search order ::=
  DEPTH FIRST BY sort specification list
| BREADTH FIRST BY sort specification list

sequence column ::= column name

cycle clause ::=
  CYCLE cycle column list SET cycle mark column TO cycle
  mark value DEFAULT non-cycle mark value USING path column

cycle column list ::= cycle column [ {commacycle column}...]

cycle column ::= column name

cycle mark column ::= column name

path column ::= column name

cycle mark value ::= value expression

non-cycle mark value ::= value expression

and so CYCLE would come *after* SET sequence column not before it.
It looks to me like we'd have to promote SET to fully reserved status,
but that probably isn't going to surprise anyone.  DEFAULT and USING
already are fully reserved.  I don't see anything else here that looks
like it should need to be reserved.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Recursive query syntax ambiguity

2007-01-26 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 search clause ::=
   SEARCH recursive search order SET sequence column

 and so CYCLE would come *after* SET sequence column not before it.

Ah, thanks, I had glossed right over the SET sequence column bit. The SET
that I had was the SET cycle column which remains after the CYCLE keyword.

 It looks to me like we'd have to promote SET to fully reserved status,
 but that probably isn't going to surprise anyone.  DEFAULT and USING
 already are fully reserved.  I don't see anything else here that looks
 like it should need to be reserved.

Having fixed that everything works fine with SET and WITH being reserved
keywords. You didn't mean to say I should be able to leave WITH unreserved did
you?

Of course that was the easy part...

Implementing non-recursive common table expressions should be fairly
mechanical though I think I'll have lots of questions about how to get all the
variable references fixed up.

Non-recursive common table expressions are always non-correlated. They can
refer to previous common table expressions but only to select from them either
in the FROM clause or in subqueries. So as far as I can see they can just go
in an InitPlan (or One-Time-Plan? I'm not sure what the distinction is) and be
referred to in the same way.

Recursive queries are of course a whole lot trickier. I've been slowly
wrapping my head around them. So far I have a pretty good idea how to churn
out a typical recursive query analogous to a CONNECT BY query. 

But the spec is a lot more ambitious than that. I haven't quite wrapped my
head around the idea of mutually recursive or non-linearly-recursive queries
yet.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Recursive query syntax ambiguity

2007-01-26 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Having fixed that everything works fine with SET and WITH being reserved
 keywords. You didn't mean to say I should be able to leave WITH unreserved did
 you?

I think we'd decided that was a lost cause, unless you see a way?

 Of course that was the easy part...

Yup ;-)

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate