Re: [HACKERS] Recursive query syntax ambiguity

Martijn van Oosterhout Mon, 05 Feb 2007 08:18:46 -0800

On Mon, Jan 29, 2007 at 01:38:02PM +0000, Gregory Stark wrote:
> Instead I suggest we create one type of reentrant node, which memoizes its
> output. It would be a kind of on-demand Materialize. It would be paired with a
> second node type which would be the only type of node which can invoke it.
> This RecursiveCall node would invoke the Memoize node using a special
> interface in which it passes enough state for the Memoize node to seek to the
> correct place in its output.


That I beleive is the right approach. I think an equivalent way of
looking at it is a "Loop" node with an InitPlan and a StepPlan.

Initially, the Loop node executes the InitPlan to get it's initial set
of tuples, storing them like a Materialize node does. After that it
keeps calling the StepNode, where somewhere inside the it has a node
that extracts a row from the aforementioned tuplestore. It stores these
returned tuples in the tuplestore also, thus giving you recursion.

<snip>

> (I've convinced myself that that's true but I should probably work out
> a good proof of it before I make all this depend on it.)

Yeah, proving it is going to be tricky, I'm not sure what the standard
says about infinite recursion.

> There are three general cases of the Memoize node. Breadth-first, Depth-first,
> and non-linearly-recursive.

I think the the only difference between depth and bredth-first searches
is (if you consider the tuplesort to be a queue) whether the new tuples
go to the front or the back of the list. But a data structures and
algorithms book will know this.

> There are a few open issues to consider. Namely, how to cost a RecursiveCall
> node.

One note: if you know that if you get p tuples out for every tuple in
(where p<1) then the asymptotic result of 1 + p + p*p+ ... is 1/(1-p)

However, I don't know it matters. You only need to cost the plan if
there are alternate paths and given the plan structure is strongly
constrained, I'm not sure how much it matters.

> Also, if a subplan has exactly one call-site we really ought to inline it as
> it will get much more reliable plans. Similarly if there are two call sites we
> should consider inlining it if the cost of materializing the results (and
> reading them back) is more than n-call-sites x the cost of executing the
> query. I would expect That would happen with plain sequential scans for
> example.

In the case where the subplan has side-effects, you can't optimise at
all. In the case of read-committed mode, will two seq-scans always
return the same result?

Hope this helps,
-- 
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] Recursive query syntax ambiguity

Reply via email to