Re: Proposing WITH ITERATIVE

2020-04-30 Thread Jeff Davis
On Tue, 2020-04-28 at 11:57 -0400, Jonah H. Harris wrote:
> Yeah, in that specific case, one of the other implementations seems
> to carry the counters along in the executor itself. But, as not all
> uses of this functionality are iteration-count-based, I think that's
> a little limiting. Using a terminator expression (of some kind) seems
> most adaptable, I think. I'll give some examples of both types of
> cases.

In my experience, graph algorithms or other queries doing more
specialized analysis tend to get pretty complex with lots of special
cases. Users may want to express these algorithms in a more familiar
language (R, Python, etc.), and to version the code (probably in an
extension).

Have you considered taking this to the extreme and implementing
something like User-Defined Table Operators[1]? Or is there a
motivation for writing such algorithms inline in SQL?

Regards,
Jeff Davis

[1] http://www.vldb.org/conf/1999/P47.pdf






Re: Proposing WITH ITERATIVE

2020-04-30 Thread Fabien COELHO



Hello,

more random thoughts about syntax, semantics, and keeping it relational.


While I'm not a huge fan of it, one of the other databases implementing
this functionality does so using the syntax:

WITH ITERATIVE R AS '(' R0 ITERATE Ri UNTIL N (ITERATIONS | UPDATES) ')' Qf

Where N in ITERATIONS represents termination at an explicit count and, in
UPDATES, represents termination after Ri updates more than n rows on table
R.

One of the main reasons I dislike the above is that it assumes N is 
known. In some cases, however, you really need termination upon a 
condition.


Yes, definitely, a (boolean?) condition is really needed, but possibly 
above N could be an expression, maybe with some separator before the 
query.


ISTM that using SELECT iterations is relational and close to the currently 
existing RECURSIVE. Separating the initialization and iterations with 
ITERATE is kind of the same approach than Peter's REPLACE, somehow, i.e. a 
new marker.


The above approach bothers me because it changes the query syntax a lot. 
The inside-WITH syntax should be the same as the normal query syntax.


First try. If we go to new markers, maybe the following, which kind of 
reuse Corey explicit condition, but replacing UPDATE with SELECT which 
makes it more generic:


 WITH R AS (
   ITERATE [STARTING] FROM R0
   WHILE/UNTIL condition REPEAT Ri
 );

Ok, it is quite procedural. It is really just a reordering of the syntax 
shown above, with a boolean condition thrown in and a heavy on (key)words 
SQL-like look and feel. It seems to make sense on a simple example:


 -- 1 by 1 count
 WITH counter(n) (
   ITERATE STARTING FROM
 SELECT 1
   WHILE n < 10 REPEAT
 SELECT n+1 FROM counter
 );

However I'm very unclear about the WHILE stuff, it makes some sense here 
because there is just one row, but what if there are severals?


 -- 2 by 2 count
 WITH counter(n) (
   ITERATE [STARTING FROM? OVER? nothing?]
 SELECT 1 UNION SELECT 2 -- cannot be empty? why not?
   WHILE n < 10 REPEAT
 -- which n it is just above?
 -- shoult it add a ANY/ALL semantics?
 -- should it really be a sub-query returning a boolean?
 -- eg: WHILE TRUE = ANY/ALL (SELECT n < 10 FROM counter)
 -- which I find pretty ugly.
 -- what else could it be?
 SELECT n+2 FROM counter
 );

Also, the overall syntax does not make much sense outside a WITH because 
one cannot reference the initial query which has no name.


Hmmm. Not very convincing:-) Let us try again.

Basically iterating is a 3 select construct: one for initializing, one for 
iterating, one for the stopping condition, with naming issues, the last 
point being exactly what WITH should solve.


by restricting the syntax to normal existing selects and moving things 
out:


  WITH stuff(n) AS
ITERATE OVER/FROM/STARTING FROM '(' initial-sub-query ')' -- or a table?
WHILE/UNTIL '(' condition-sub-query ')'
-- what is TRUE/FALSE? non empty? other?
-- WHILE/UNTIL [NOT] EXISTS '(' query ')' ??
REPEAT/DO/LOOP/... '(' sub-query-over-stuff ')'
  );

At least the 3 sub-queries are just standard queries, only the wrapping 
around (ITERATE ... WHILE/UNTIL ... REPEAT ...) is WITH specific, which is 
somehow better than having new separators in the query syntax itself. It 
is pretty relational inside, and procedural on the outside, the two levels 
are not mixed, which is the real win from my point of view.


ISTM that the key take away from the above discussion is to keep the 
overhead syntax in WITH, it should not be moved inside the query in any 
way, like adding REPLACE or WHILE or whatever there. This way there is 
minimal interference with future query syntax extensions, there is only a 
specific WITH-level 3-query construct with pretty explicit markers.


--
Fabien.




Re: Proposing WITH ITERATIVE

2020-04-29 Thread Jonah H. Harris
On Wed, Apr 29, 2020 at 5:54 PM Jonah H. Harris 
wrote:

> On Wed, Apr 29, 2020 at 4:50 PM Corey Huinker 
> wrote:
>
>> Having both WHERE and WHILE might look awkward.
>>>
>>
>> Maybe an UNTIL instead of WHILE?
>>
>
> While I'm not a huge fan of it, one of the other databases implementing
> this functionality does so using the syntax:
>
> WITH ITERATIVE R AS '(' R0 ITERATE Ri UNTIL N (ITERATIONS | UPDATES) ')' Qf
>
> Where N in ITERATIONS represents termination at an explicit count and, in
> UPDATES, represents termination after Ri updates more than n rows on table
> R.
>

Sent too soon :) One of the main reasons I dislike the above is that it
assumes N is known. In some cases, however, you really need termination
upon a condition.

-- 
Jonah H. Harris


Re: Proposing WITH ITERATIVE

2020-04-29 Thread Jonah H. Harris
On Wed, Apr 29, 2020 at 4:50 PM Corey Huinker 
wrote:

> Having both WHERE and WHILE might look awkward.
>>
>
> Maybe an UNTIL instead of WHILE?
>

While I'm not a huge fan of it, one of the other databases implementing
this functionality does so using the syntax:

WITH ITERATIVE R AS '(' R0 ITERATE Ri UNTIL N (ITERATIONS | UPDATES) ')' Qf

Where N in ITERATIONS represents termination at an explicit count and, in
UPDATES, represents termination after Ri updates more than n rows on table
R.

-- 
Jonah H. Harris


Re: Proposing WITH ITERATIVE

2020-04-29 Thread Corey Huinker
>
>
> > Perhaps something like this would be more readable
> >
> > WITH t AS (
> >UPDATE ( SELECT 1 AS ctr, 'x' as val )
> >SET ctr = ctr + 1, val = val || 'x'
> >WHILE ctr <= 100
> >RETURNING ctr, val
> > )
> >
> > The notion of an UPDATE on an ephemeral subquery isn't that special, see
> > "subquery2" in
> >
> https://docs.oracle.com/cd/B19306_01/appdev.102/b14261/update_statement.htm
> ,
>
> I must admit that I do not like much needing another level of subquery,
> but maybe it could just be another named query in the WITH statement.
>

So like this:
WITH initial_conditions as (SELECT 1 as ctr, 'x' as val)
UPDATE initial_conditions
SET ctr = ctr + 1, val = val || 'x'
WHILE ctr <= 100
RETURNING ctr, val


> ISTM that UPDATE is quite restrictive as the number of rows cannot
> change, which does not seem desirable at all? How could I add or remove
> rows from one iteration to the next?
>

My understanding was that maintaining a fixed number of rows was a desired
feature.


> ISTM that the WHILE would be checked before updating, so that WHILE FALSE
> does nothing, in which case its position after SET is odd.
>

True, but having the SELECT before the FROM is equally odd.


> Having both WHERE and WHILE might look awkward.
>

Maybe an UNTIL instead of WHILE?


>
> Also it looks much more procedural this way, which is the point, but also
> depart from the declarative SELECT approach of WITH RECURSIVE.
>

Yeah, just throwing it out as a possibility. Looking again at what I
suggested, it looks a bit like the Oracle "CONNECT BY level <= x" idiom.

I suspect that the SQL standards body already has some preliminary work
done, and we should ultimately follow that.


Re: Proposing WITH ITERATIVE

2020-04-29 Thread Fabien COELHO


Hello Corey, Hello Peter,

My 0.02 € about the alternative syntaxes:

Peter:


I think a syntax that would fit better within the existing framework
would be something like

WITH RECURSIVE t AS (
 SELECT base case
   REPLACE ALL  -- instead of UNION ALL
 SELECT recursive case
)


A good point about this approach is that the replacement semantics is 
clear, whereas using ITERATIVE with UNION is very misleading, as it is 
*not* a union at all.


This said I'm wondering how the parser would react.

Moreover, having a different syntax for normal queries and inside WITH 
query looks very undesirable from a language design point of view. This

suggests that the user should be able to write it anywhere:

  SELECT 1 REPLACE SELECT 2;

Well, maybe.

I'm unclear whether "REPLACE ALL" vs "REPLACE" makes sense, ISTM that 
there could be only one replacement semantics (delete the content and 
insert a new one)?


REPLACE should have an associativity defined wrt other operators:

  SELECT 1 UNION SELECT 2 REPLACE SELECT 3; -- how many rows?

I do not see anything obvious. Probably 2 rows.

Corey:


Obviously I'm very concerned about doing something that the SQL Standard
will clobber somewhere down the road. Having said that, the recursive
syntax always struck me as awkward even by SQL standards.


Indeed!


Perhaps something like this would be more readable

WITH t AS (
   UPDATE ( SELECT 1 AS ctr, 'x' as val )
   SET ctr = ctr + 1, val = val || 'x'
   WHILE ctr <= 100
   RETURNING ctr, val
)

The notion of an UPDATE on an ephemeral subquery isn't that special, see
"subquery2" in
https://docs.oracle.com/cd/B19306_01/appdev.102/b14261/update_statement.htm ,


I must admit that I do not like much needing another level of subquery, 
but maybe it could just be another named query in the WITH statement.


ISTM that UPDATE is quite restrictive as the number of rows cannot 
change, which does not seem desirable at all? How could I add or remove 
rows from one iteration to the next?


ISTM that the WHILE would be checked before updating, so that WHILE FALSE 
does nothing, in which case its position after SET is odd.


Having both WHERE and WHILE might look awkward.

Also it looks much more procedural this way, which is the point, but also 
depart from the declarative SELECT approach of WITH RECURSIVE.


--
Fabien.

Re: Proposing WITH ITERATIVE

2020-04-29 Thread Corey Huinker
On Wed, Apr 29, 2020 at 10:34 AM Jonah H. Harris 
wrote:

> On Wed, Apr 29, 2020 at 7:22 AM Peter Eisentraut <
> peter.eisentr...@2ndquadrant.com> wrote:
>
>> Yeah the RECURSIVE vs ITERATIVE is a bit of a red herring here.  As you
>> say, the RECURSIVE keyword doesn't specify the processing but marks the
>> fact that the specification of the query is recursive.
>>
>
> Agreed. I started thinking through Fabien's response last night.
>
> I think a syntax that would fit better within the existing framework
>> would be something like
>>
>> WITH RECURSIVE t AS (
>>  SELECT base case
>>REPLACE ALL  -- instead of UNION ALL
>>  SELECT recursive case
>> )
>>
>
> I was originally thinking more along the lines of Fabien's approach, but
> this is similarly interesting.
>

Obviously I'm very concerned about doing something that the SQL Standard
will clobber somewhere down the road. Having said that, the recursive
syntax always struck me as awkward even by SQL standards.

Perhaps something like this would be more readable

WITH t AS (
UPDATE ( SELECT 1 AS ctr, 'x' as val )
SET ctr = ctr + 1, val = val || 'x'
WHILE ctr <= 100
RETURNING ctr, val
)

The notion of an UPDATE on an ephemeral subquery isn't that special, see
"subquery2" in
https://docs.oracle.com/cd/B19306_01/appdev.102/b14261/update_statement.htm ,
so the only syntax here without precedence is dropping a WHILE into an
UPDATE statement.


Re: Proposing WITH ITERATIVE

2020-04-29 Thread Jonah H. Harris
On Wed, Apr 29, 2020 at 7:22 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> Yeah the RECURSIVE vs ITERATIVE is a bit of a red herring here.  As you
> say, the RECURSIVE keyword doesn't specify the processing but marks the
> fact that the specification of the query is recursive.
>

Agreed. I started thinking through Fabien's response last night.

I think a syntax that would fit better within the existing framework
> would be something like
>
> WITH RECURSIVE t AS (
>  SELECT base case
>REPLACE ALL  -- instead of UNION ALL
>  SELECT recursive case
> )
>

I was originally thinking more along the lines of Fabien's approach, but
this is similarly interesting.

-- 
Jonah H. Harris


Re: Proposing WITH ITERATIVE

2020-04-29 Thread Peter Eisentraut

On 2020-04-29 07:09, Fabien COELHO wrote:

I'm wondering about how to use such a feature in the context of WITH query
with several queries having different behaviors. Currently "WITH"
introduces a named-query (like a view), "WITH RECURSIVE" introduces a mix
of recursive and named queries, pg really sees whether each one is
recursive or not, and "RECURSIVE" is required but could just be guessed.


Yeah the RECURSIVE vs ITERATIVE is a bit of a red herring here.  As you 
say, the RECURSIVE keyword doesn't specify the processing but marks the 
fact that the specification of the query is recursive.


I think a syntax that would fit better within the existing framework 
would be something like


WITH RECURSIVE t AS (
SELECT base case
  REPLACE ALL  -- instead of UNION ALL
SELECT recursive case
)

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Proposing WITH ITERATIVE

2020-04-28 Thread Fabien COELHO


Hello Jonah,

Nice.


-- No ORDER/LIMIT is required with ITERATIVE as only a single tuple is
present
EXPLAIN ANALYZE
WITH ITERATIVE fib_sum (iteration, previous_number, new_number)
 AS (SELECT 1, 0::numeric, 1::numeric
  UNION ALL
 SELECT (iteration + 1), new_number, (previous_number + new_number)
   FROM fib_sum
  WHERE iteration <= 1)
SELECT r.iteration, r.new_number
 FROM fib_sum r;


Nice.

My 0,02€ about the feature design:

I'm wondering about how to use such a feature in the context of WITH query 
with several queries having different behaviors. Currently "WITH" 
introduces a named-query (like a view), "WITH RECURSIVE" introduces a mix 
of recursive and named queries, pg really sees whether each one is 
recursive or not, and "RECURSIVE" is required but could just be guessed.


Now that there could be 3 variants in the mix, and for the feature to be 
orthogonal I think that it should be allowed. However, there is no obvious 
way to distinguish a RECURSIVE from an ITERATIVE, as it is more a 
behavioral thing than a structural one. This suggests allowing to tag each 
query somehow, eg before, which would be closer to the current approach:


  WITH
foo(i) AS (simple select),
RECURSIVE bla(i) AS (recursive select),
ITERATIVE blup(i) AS (iterative select),

or maybe after AS, which may be clearer because closer to the actual 
query, which looks better to me:


  WITH
foo(i) AS (simple select),
bla(i) AS RECURSIVE (recursive select),
boo(i) AS ITERATIVE (iterative select),
…


Also, with 3 cases I would prefer that the default has a name so someone 
can talk about it otherwise than saying "default". Maybe SIMPLE would 
suffice, or something else. ISTM that as nothing is expected between AS 
and the open parenthesis, there is no need to have a reserved keyword, 
which is a good thing for the parser.


--
Fabien.

Re: Proposing WITH ITERATIVE

2020-04-28 Thread Jonah H. Harris
On Mon, Apr 27, 2020 at 8:50 PM Jeff Davis  wrote:

> Can you illustrate with some examples? I get that one is appending and
> the other is modifying in-place, but how does this end up looking in
> the query language?
>

To ensure credit is given to the original authors, the initial patch I'm
working with (against Postgres 11 and 12) came from Denis Hirn, Torsten
Grust, and Christian Duta. Attached is a quick-and-dirty merge of that
patch that applies cleanly against 13-devel. It is not solid, at all, but
demonstrates the functionality. At present, my updates can be found at
https://github.com/jonahharris/postgres/tree/with-iterative

As the patch makes use of additional booleans for iteration, if there's
interest in incorporating this functionality, I'd like to discuss with Tom,
Robert, et al regarding the current use of booleans for CTE recursion
differentiation in parsing and planning. In terms of making it
production-ready, I think the cleanest method would be to use a bitfield
(rather than booleans) to differentiate recursive and iterative CTEs.
Though, as that would touch more code, it's obviously up for discussion.

I'm working on a few good standalone examples of PageRank, shortest path,
etc. But, the simplest Fibonacci example can be found here:

EXPLAIN ANALYZE
WITH RECURSIVE fib_sum (iteration, previous_number, new_number)
  AS (SELECT 1, 0::numeric, 1::numeric
   UNION ALL
  SELECT (iteration + 1), new_number, (previous_number + new_number)
FROM fib_sum
   WHERE iteration <= 1)
SELECT r.iteration, r.new_number
  FROM fib_sum r
 ORDER BY 1 DESC
 LIMIT 1;

QUERY PLAN

--
 Limit  (cost=3.81..3.81 rows=1 width=36) (actual time=44.418..44.418
rows=1 loops=1)
   CTE fib_sum
 ->  Recursive Union  (cost=0.00..3.03 rows=31 width=68) (actual
time=0.005..14.002 rows=10001 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=68) (actual
time=0.004..0.004 rows=1 loops=1)
   ->  WorkTable Scan on fib_sum  (cost=0.00..0.24 rows=3 width=68)
(actual time=0.001..0.001 rows=1 loops=10001)
 Filter: (iteration <= 1)
 Rows Removed by Filter: 0
   ->  Sort  (cost=0.78..0.85 rows=31 width=36) (actual time=44.417..44.417
rows=1 loops=1)
 Sort Key: r.iteration DESC
 Sort Method: top-N heapsort  Memory: 27kB
 ->  CTE Scan on fib_sum r  (cost=0.00..0.62 rows=31 width=36)
(actual time=0.009..42.660 rows=10001 loops=1)
 Planning Time: 0.331 ms
 Execution Time: 45.949 ms
(13 rows)

-- No ORDER/LIMIT is required with ITERATIVE as only a single tuple is
present
EXPLAIN ANALYZE
WITH ITERATIVE fib_sum (iteration, previous_number, new_number)
  AS (SELECT 1, 0::numeric, 1::numeric
   UNION ALL
  SELECT (iteration + 1), new_number, (previous_number + new_number)
FROM fib_sum
   WHERE iteration <= 1)
SELECT r.iteration, r.new_number
  FROM fib_sum r;

QUERY PLAN

--
 CTE Scan on fib_sum r  (cost=3.03..3.65 rows=31 width=36) (actual
time=11.626..11.627 rows=1 loops=1)
   CTE fib_sum
 ->  Recursive Union  (cost=0.00..3.03 rows=31 width=68) (actual
time=11.621..11.621 rows=1 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=68) (actual
time=0.001..0.001 rows=1 loops=1)
   ->  WorkTable Scan on fib_sum  (cost=0.00..0.24 rows=3 width=68)
(actual time=0.001..0.001 rows=1 loops=10001)
 Filter: (iteration <= 1)
 Rows Removed by Filter: 0
 Planning Time: 0.068 ms
 Execution Time: 11.651 ms
(9 rows)


-- 
Jonah H. Harris


with_iterative_v1.patch
Description: Binary data


Re: Proposing WITH ITERATIVE

2020-04-28 Thread Jonah H. Harris
On Tue, Apr 28, 2020 at 11:51 AM Stephen Frost  wrote:

> Greetings Jonah!
>

Hey, Stephen. Hope you're doing well, man!


> One of the first question that we need to ask though, imv anyway, is- do
> the other databases use the WITH ITERATIVE syntax?  How many of them?
> Are there other approaches?  Has this been brought up to the SQL
> committee?
>

There are four that I'm aware of, but I'll put together a full list. I
don't believe it's been proposed to the standards committee, but I'll see
if I can find transcripts/proposals. I imagine those are all still public.

In general, we really prefer to avoid extending SQL beyond the standard,
> especially in ways that the standard is likely to be expanded.  In
> other words, we'd really prefer to avoid the risk that the SQL committee
> declares WITH ITERATIVE to mean something else in the future, causing us
> to have to have a breaking change to become compliant.


Agreed. That's my main concern as well.


> Now, if all the other major DB vendors have WITH ITERATIVE and they all
> work the same
> way, then we can have at least some confidence that the SQL committee
> will end up defining it in that way and there won't be any issue.
>

This is where it sucks, as each vendor has done it differently. One uses
WITH ITERATE, one uses WITH ITERATIVE, others use, what I consider to be, a
nasty operator-esque style... I think ITERATIVE makes the most sense, but
it does create a future contention as that definitely seems like the syntax
the committee would use as well.

Oracle ran into this issue as well, which is why they added an additional
clause to WITH that permits selection of depth/breadth-first search et al.
While it's kinda nasty, we could always similarly amend WITH RECURSIVE to
add an additional ITERATE or something to the tail of the with_clause rule.
But, that's why I'm looking for feedback :)

We do have someone on the committee who watches these lists, hopefully
> they'll have a chance to comment on this.  Perhaps it's already in
> discussion in the committee.
>

That would be awesome.

-- 
Jonah H. Harris


Re: Proposing WITH ITERATIVE

2020-04-28 Thread Jonah H. Harris
On Mon, Apr 27, 2020 at 8:50 PM Jeff Davis  wrote:

> Hi,
>

Hey, Jeff. Long time no talk. Good to see you're still on here.

You might get better feedback in a month or so; right now we just got
> into feature freeze.
>

Yep. No hurry. I've just been playing with this and wanted to start getting
feedback. It's a side-project for me anyway, so time is limited.


> Can you illustrate with some examples? I get that one is appending and
> the other is modifying in-place, but how does this end up looking in
> the query language?
>

I'm putting together a few concrete real-world examples.

> Rather than stopping when no new tuples are generated, WITH ITERATIVE
> > stops when a user-defined predicate evaluates to true.
>
> Why stop when it evaluates to true, and not false?
>

It's how they implemented it. A few other databases have implemented
similar functionality but, as it's not standard, it's kinda just up to each
implementor. I'm not married to that idea, but it has worked well for me so
far.

It seems like the benefit comes from carrying information along within
> tuples (by adding to scores or counters) rather than appending tuples.
> Is it possible to achieve this in other ways? The recursive CTE
> implementation is a very direct implementation of the standard, perhaps
> there are smarter approaches?
>

Yeah, in that specific case, one of the other implementations seems to
carry the counters along in the executor itself. But, as not all uses of
this functionality are iteration-count-based, I think that's a little
limiting. Using a terminator expression (of some kind) seems most
adaptable, I think. I'll give some examples of both types of cases.

-- 
Jonah H. Harris


Re: Proposing WITH ITERATIVE

2020-04-28 Thread Stephen Frost
Greetings Jonah!

* Jonah H. Harris (jonah.har...@gmail.com) wrote:
> On Tue, Apr 28, 2020 at 6:19 AM Andreas Karlsson  wrote:
> 
> > Do you have any examples of queries where it would help? It is pretty
> > hard to say how much value some new syntax adds without seeing how it
> > improves an intended use case.
> 
> Thanks for the response. I'm currently working on a few examples per Jeff's
> response along with some benchmark information including improvements in
> response time and memory usage of the current implementation. In the
> meantime, as this functionality has been added to a couple of other
> databases and there's academic research on it, if you're interested, here's
> a few papers with examples:
> 
> http://faculty.neu.edu.cn/cc/zhangyf/papers/2018-ICDCS2018-sqloop.pdf
> http://db.in.tum.de/~passing/publications/dm_in_hyper.pdf

Nice!

One of the first question that we need to ask though, imv anyway, is- do
the other databases use the WITH ITERATIVE syntax?  How many of them?
Are there other approaches?  Has this been brought up to the SQL
committee?

In general, we really prefer to avoid extending SQL beyond the standard,
especially in ways that the standard is likely to be expanded.  In
other words, we'd really prefer to avoid the risk that the SQL committee
declares WITH ITERATIVE to mean something else in the future, causing us
to have to have a breaking change to become compliant.  Now, if all the
other major DB vendors have WITH ITERATIVE and they all work the same
way, then we can have at least some confidence that the SQL committee
will end up defining it in that way and there won't be any issue.

We do have someone on the committee who watches these lists, hopefully
they'll have a chance to comment on this.  Perhaps it's already in
discussion in the committee.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Proposing WITH ITERATIVE

2020-04-28 Thread Jonah H. Harris
On Tue, Apr 28, 2020 at 6:19 AM Andreas Karlsson  wrote:

> Do you have any examples of queries where it would help? It is pretty
> hard to say how much value some new syntax adds without seeing how it
> improves an intended use case.
>

Hey, Andreas.

Thanks for the response. I'm currently working on a few examples per Jeff's
response along with some benchmark information including improvements in
response time and memory usage of the current implementation. In the
meantime, as this functionality has been added to a couple of other
databases and there's academic research on it, if you're interested, here's
a few papers with examples:

http://faculty.neu.edu.cn/cc/zhangyf/papers/2018-ICDCS2018-sqloop.pdf
http://db.in.tum.de/~passing/publications/dm_in_hyper.pdf

-- 
Jonah H. Harris


Re: Proposing WITH ITERATIVE

2020-04-28 Thread Jonah H. Harris
On Tue, Apr 28, 2020 at 6:16 AM Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> will help the community to focus on the actual details of your proposal.
>

I'd like it if the community would focus on the details of the proposal as
well :)

-- 
Jonah H. Harris


Re: Proposing WITH ITERATIVE

2020-04-28 Thread Andreas Karlsson
On 4/27/20 6:52 PM, Jonah H. Harris wrote:> If there's any interest, 
I'll clean-up their patch and submit. Thoughts?


Hi,

Do you have any examples of queries where it would help? It is pretty 
hard to say how much value some new syntax adds without seeing how it 
improves an intended use case.


Andreas




Re: Proposing WITH ITERATIVE

2020-04-28 Thread Oleksandr Shulgin
On Tue, Apr 28, 2020 at 5:49 AM Jonah H. Harris 
wrote:

> On Mon, Apr 27, 2020 at 11:33 PM David Fetter  wrote:
>
>>
>> Have the authors agreed to make it available to the project under a
>> compatible license?
>
>
> If there’s interest, obviously. Otherwise I wouldn’t be asking.
>
> I said from the start why I wasn’t attaching a patch and that I was seeing
> feedback. Honestly, David, stop wasting my, and list time, asking
> pointless off-topic questions.
>

Jonah,

I see it the other way round—it could end up as a waste of everyone's time
discussing the details, if the authors don't agree to publish their code in
the first place.  Of course, it could also be written from scratch, in
which case I guess someone else from the community (who haven't seen that
private code) would have to take a stab at it, but I believe it helps to
know this in advance.

I also don't see how it "obviously" follows from your two claims: "I've
found this functionality" and "I'll clean-up their patch and submit", that
you have even asked (or, for that matter—found) the authors of that code.

Finally, I'd like to suggest you adopt a more constructive tone and become
familiar with the project's Code of Conduct[1], if you haven't yet.  I am
certain that constructive, respectful communication from your side will
help the community to focus on the actual details of your proposal.

Kind regards,
-- 
Alex

[1] https://www.postgresql.org/about/policies/coc/


Re: Proposing WITH ITERATIVE

2020-04-27 Thread Jonah H. Harris
On Mon, Apr 27, 2020 at 11:33 PM David Fetter  wrote:

>
> Have the authors agreed to make it available to the project under a
> compatible license?


If there’s interest, obviously. Otherwise I wouldn’t be asking.

I said from the start why I wasn’t attaching a patch and that I was seeing
feedback. Honestly, David, stop wasting my, and list time, asking pointless
off-topic questions.

-- 
Jonah H. Harris


Re: Proposing WITH ITERATIVE

2020-04-27 Thread David Fetter
On Mon, Apr 27, 2020 at 10:44:04PM -0400, Jonah H. Harris wrote:
> On Mon, Apr 27, 2020 at 10:32 PM David Fetter  wrote:
> > On Mon, Apr 27, 2020 at 12:52:48PM -0400, Jonah H. Harris wrote:
> > > Hey, everyone.
> >
> > > If there's any interest, I'll clean-up their patch and submit. Thoughts?
> >
> > Where's the current patch?
> 
> 
> It’s private. Hence, why I’m inquiring as to interest.

Have the authors agreed to make it available to the project under a
compatible license?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Proposing WITH ITERATIVE

2020-04-27 Thread Jonah H. Harris
On Mon, Apr 27, 2020 at 10:32 PM David Fetter  wrote:

> On Mon, Apr 27, 2020 at 12:52:48PM -0400, Jonah H. Harris wrote:
> > Hey, everyone.
>
> > If there's any interest, I'll clean-up their patch and submit. Thoughts?
>
> Where's the current patch?


It’s private. Hence, why I’m inquiring as to interest.

-- 
Jonah H. Harris


Re: Proposing WITH ITERATIVE

2020-04-27 Thread David Fetter
On Mon, Apr 27, 2020 at 12:52:48PM -0400, Jonah H. Harris wrote:
> Hey, everyone.

> If there's any interest, I'll clean-up their patch and submit. Thoughts?

Where's the current patch?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Proposing WITH ITERATIVE

2020-04-27 Thread Jeff Davis
Hi,

You might get better feedback in a month or so; right now we just got
into feature freeze.

On Mon, 2020-04-27 at 12:52 -0400, Jonah H. Harris wrote:
> In that it can reference a relation within its definition, this
> iterative variant has similar capabilities as recursive CTEs. In
> contrast to recursive CTEs, however, rather than appending new
> tuples, this variant performs a replacement of the intermediate
> relation. As such, the final result consists of a relation containing
> tuples computed during the last iteration only. Why would we want
> this?

Can you illustrate with some examples? I get that one is appending and
the other is modifying in-place, but how does this end up looking in
the query language?

> This type of computation pattern is often found in data and graph
> mining algorithms.

It certainly sounds useful.

> Rather than stopping when no new tuples are generated, WITH ITERATIVE
> stops when a user-defined predicate evaluates to true.

Why stop when it evaluates to true, and not false?

> First, iterative CTEs consume significantly less memory. Consider a
> CTE of N tuples and I iterations. With recursive CTEs, such a
> relation would grow to (N * I) tuples. With iterative CTEs, however,
> all prior iterations are discarded early. As such, the size of the
> relation would remain N, requiring only a maximum of (2 * N) tuples
> for comparisons of the current and the prior iteration. Furthermore,
> in queries where the number of executed iterations represents the
> stop criterion, recursive CTEs require even more additional per-tuple 
> overhead - to carry along the iteration counter.

It seems like the benefit comes from carrying information along within
tuples (by adding to scores or counters) rather than appending tuples.
Is it possible to achieve this in other ways? The recursive CTE
implementation is a very direct implementation of the standard, perhaps
there are smarter approaches?

Regards,
Jeff Davis





Proposing WITH ITERATIVE

2020-04-27 Thread Jonah H. Harris
Hey, everyone.

It's been quite a while since I last contributed a patch but, as this is a
new feature, I wanted to gather feedback before doing so. I've found this
functionality, already in use at Universität Tübingen, to be both highly
beneficial in many of my queries as well as a small change to Postgres - a
good trade-off IMO.

As you know, Postgres currently supports SQL:1999 recursive common table
expressions, constructed using WITH RECURSIVE, which permit the computation
of growing relations (e.g., transitive closures.) Although it is possible
to use this recursion for general-purpose iterations, doing so is a
deviation from its intended use case. Accordingly, an iterative-only use of
WITH RECURSIVE often results in sizable overhead and poor optimizer
decisions. In this email, I'd like to propose a similar but
iterative-optimized form of CTE - WITH ITERATIVE.

In that it can reference a relation within its definition, this iterative
variant has similar capabilities as recursive CTEs. In contrast to
recursive CTEs, however, rather than appending new tuples, this variant
performs a replacement of the intermediate relation. As such, the final
result consists of a relation containing tuples computed during the last
iteration only. Why would we want this?

This type of computation pattern is often found in data and graph mining
algorithms. In PageRank, for example, the initial ranks are updated in each
iteration. In clustering algorithms, assignments of data tuples to clusters
are updated in every iteration. Something these types of algorithms have in
common is that they operate on fixed-size datasets, where only specific
values (ranks, assigned clusters, etc.) are updated.

Rather than stopping when no new tuples are generated, WITH ITERATIVE stops
when a user-defined predicate evaluates to true. By providing a
non-appending iteration concept with a while-loop-style stop criterion, we
can massively speed up query execution due to better optimizer decisions.
Although it is possible to implement the types of algorithms mentioned
above using recursive CTEs, this feature has two significant advantages:

First, iterative CTEs consume significantly less memory. Consider a CTE of
N tuples and I iterations. With recursive CTEs, such a relation would grow
to (N * I) tuples. With iterative CTEs, however, all prior iterations are
discarded early. As such, the size of the relation would remain N,
requiring only a maximum of (2 * N) tuples for comparisons of the current
and the prior iteration. Furthermore, in queries where the number of
executed iterations represents the stop criterion, recursive CTEs require
even more additional per-tuple overhead - to carry along the iteration
counter.

Second, iterative CTEs have lower query response times. Because of its
smaller size, the time to scan and process the intermediate relation, to
re-compute ranks, clusters, etc., is significantly reduced.

In short, iterative CTEs retain the flexibility of recursive CTEs, but
offer a significant advantage for algorithms which don't require results
computed throughout all iterations. As this feature deviates only slightly
from WITH RECURSIVE, there is very little code required to support it (~250
loc.)

If there's any interest, I'll clean-up their patch and submit. Thoughts?

-- 
Jonah H. Harris