Re: [HACKERS] Internal design of MERGE, with Rules

2008-05-08 Thread Simon Riggs
On Wed, 2008-04-30 at 16:58 +0100, Simon Riggs wrote:
 The main query will then look like this
 
 select   target.ctid
 ,case when-not-matched (as above)
 ,case when-matched (as above)
 ,(all other columns required for side queries)  
 from source-query left outer join target on join-condition
 where (when-matched-condition-0
 or when-matched-condition-1
 ...
 or when-matched-condition-N)
 or (when-not-matched-condition-0
 or when-not-matched-condition-1
 ...
 or when-not-matched-condition-N)
 
 The WHERE clause is likely required in case we get queries like this
 
 MERGE target t
 USING (select * from source) s
 ON (s.pkey = t.pkey)
 WHEN MATCHED AND s.pkey = $1
 UPDATE SET col = $2;
 
 which would be perfectly valid, even if we might hope that they had
 coded like this
 
 MERGE target
 USING (select * from source WHERE index-column = $1)
 ON (join-condition)
 WHEN MATCHED 
 UPDATE SET col = $2; 

Peter has just jogged my memory about double evaluation of volatile
functions, so the above transformation isn't correct.

We would not be able to fully optimise a MERGE statement like this 

MERGE target t
USING (select * from source) s
 ON (s.pkey = t.pkey)
 WHEN MATCHED AND s.key = $1
UPDATE SET col = $2;

since we won't be able to pass the clause s.pkey = $1 down into the s
query so it would use an index. The following statement will be faster,
but will in all cases give an identical result:

MERGE target t
USING (select * from source WHERE key = $1) s
 ON (s.pkey = t.pkey)
 WHEN MATCHED
UPDATE SET col = $2;

I don't think its too important, since the latter is the way people
would have used MERGE in SQL:2003 anyway.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Internal design of MERGE, with Rules

2008-05-01 Thread Simon Riggs
On Thu, 2008-05-01 at 00:26 +0100, Sam Mason wrote:

 On Wed, Apr 30, 2008 at 04:58:52PM +0100, Simon Riggs wrote:
  That means we probably need to introduce new infrastructure in the tcop
  or executor modules to handle queries-within-queries. This isn't
  special-casing MERGE so much as introducing infrastructure for a new
  class of query, such as MERGE, REPLACE, INSERT ELSE UPDATE. (Merge
  itself does cover almost all cases of this type of query, but we'd be
  able to fairly easily support all of the different syntax).
  
  MERGE would then be represented by a query that has many side
  queries (so called so we don't confused calling them sub-queries).
 
 Why make them special cases?  (I'm sure there's a good reason!)

Well, I'm not making them special cases. The infrastructure would be
generalised for any statement type that wanted to do roughly this.

 I've sometimes wanted to be able to put DML statements inside SELECT
 statements.  The following is a slightly reasonable example:
 
   INSERT INTO ilog (i,ts,n)
 SELECT i, now(), COUNT(*)
 FROM (
   INSERT INTO bar (x,y)
 SELECT 5, n
 FROM baz
 WHERE i = 10
 RETURNING i) x
 GROUP BY i;

OK, but that's not what I'm working on... useful as it sounds.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Internal design of MERGE, with Rules

2008-04-30 Thread Simon Riggs
MERGE looks like it may need some new infrastructure to support it,
depending upon the implementation route. Guidance and discussion
requested from main hackers, if possible.

This is a separate post because there's no further discussion here on
the external behaviour of MERGE, or its concurrency/locking.

Assumptions
---

* MERGE must be a parameterisable, preparable command. The parameters
might be anywhere in the statement i.e. in the USING clause, or in the
action statements, or both.

* MERGE must work like any other statement, e.g. multi-statement
requests must work also e.g. UPDATE...; MERGE ... ;INSERT ...

* MERGE works on tables only, not views. PostgreSQL doesn't support
updateable views, so this isn't a problem to solve as part of this
implementation.

* MERGE contains possibly multiple INSERTs, UPDATEs and DELETEs, each of
which can invoke a rule to generate even more commands. 

* We would not add MERGE to the list of possible events that invoke a
Rule. (The data changing events would remain as INSERT, UPDATE, DELETE).
However, a MERGE statement might be used *within* a Rule, giving a tree
of plans rather than a single plan or a list of plans. (I can't see any
way or any reason to say no allowed to that...)

Implications


To fully support prepared queries, MERGE needs to be a query not a
utility command as I had originally envisaged. MERGE internally consists
of one driving query that is always executed in full to retrieve all
tuples generated, plus a number of other queries that may or may not get
executed according to the evaluation of the WHEN clauses.

Rule re-writing happens at the plan level. That means we need to
recognise MERGE as being a new form of query: a query that has multiple
other optional queries buried within it, each of which can be
independently re-written. Only by doing that can we apply the rules
correctly to the various parts of MERGE.

That means we probably need to introduce new infrastructure in the tcop
or executor modules to handle queries-within-queries. This isn't
special-casing MERGE so much as introducing infrastructure for a new
class of query, such as MERGE, REPLACE, INSERT ELSE UPDATE. (Merge
itself does cover almost all cases of this type of query, but we'd be
able to fairly easily support all of the different syntax).

MERGE would then be represented by a query that has many side
queries (so called so we don't confused calling them sub-queries). Each
side query can be rewritten into a list of queries. Any of those queries
could be a MERGE itself, so we might make each Query a tree of Queries.

The main query and the side queries are related, in that the values used
for the side queries come from the main query. The side queries'
external references will be replaced by parameters, in addition to other
parameters already supplied. So the side queries are parameterised, even
if the MERGE statement wasn't parameterised. The main driving query will
be executed once, the side queries once per row (if at all).

Let's look back at the infrastructure aspects. My options so far for
implementation are

0. Write MERGE in PL/pgSQL. Err, no.

1. We treat the MERGE similar to an Update-with-conditional-effects and
have the MERGE's main plan deliver tuples that we then act on during
ExecutePlan() using a new ExecMerge() function. We would preserve the
text of the side queries and execute them using SPI from within
ExecMerge(). That takes care of the rule rewriting, since we effectively
don't know what's happening inside SPI. Feels ugly, is all I can say,
though actually very similar to the way RI works.

2. Similar to (1), but instead of using SPI we start another portal from
within ExecMerge() function. Portals within portals. Seems very likely
for something to break in an unexpected way.

3. We treat a MERGE as a new kind of portal query. Any query containing
a MERGE will be forced to be a PORTAL_MULTI_QUERY. We then introduce a
new query processing routine for MERGE, ProcessQueryWithActions() rather
than running ProcessQuery() during PortalRunMulti(). This can then
execute the merge-query similarly to a cursor, retrieving tuples that
are then fed to the side queries that then run in separate portals.
Doing it this way will mean that we don't need to run the executor
recursively, we just switch from main query to retrieve next row and
then execute the appropriate side query. The executor level calls will
look almost exactly as it would if we wrote MERGE in PL/pgSQL - fetching
from a main cursor and then executing side queries as appropriate. The
hacking will have similar-ish effects to that induced by the changes for
RETURNING.

Something like

ExecutorStart(mainQueryDesc, 0);
for (;;)
{
/* FETCH 1 */
ExecutorRun(mainQueryDesc, ForwardScanDirection, 1L);
if (mainQueryDesc-estate-es_processed == 0)
break;

//identify side query 

  

Re: [HACKERS] Internal design of MERGE, with Rules

2008-04-30 Thread Sam Mason
On Wed, Apr 30, 2008 at 04:58:52PM +0100, Simon Riggs wrote:
 That means we probably need to introduce new infrastructure in the tcop
 or executor modules to handle queries-within-queries. This isn't
 special-casing MERGE so much as introducing infrastructure for a new
 class of query, such as MERGE, REPLACE, INSERT ELSE UPDATE. (Merge
 itself does cover almost all cases of this type of query, but we'd be
 able to fairly easily support all of the different syntax).
 
 MERGE would then be represented by a query that has many side
 queries (so called so we don't confused calling them sub-queries).

Why make them special cases?  (I'm sure there's a good reason!)

I've sometimes wanted to be able to put DML statements inside SELECT
statements.  The following is a slightly reasonable example:

  INSERT INTO ilog (i,ts,n)
SELECT i, now(), COUNT(*)
FROM (
  INSERT INTO bar (x,y)
SELECT 5, n
FROM baz
WHERE i = 10
RETURNING i) x
GROUP BY i;

Hum, that implies a nasty schema (ilog.ts should be in bar).  Ah well, I
hope you get the idea.  The simplest example I can think of is:

  SELECT * FROM (INSERT INTO foo (n) VALUES (1) RETURNING 1) x;

Hum, I think I may have just thought of why.  What would:

  SELECT (INSERT INTO foo (n) VALUES (f.n) RETURNING 1)
  FROM foo f;

be expected to do?  I'm thinking of nasty cases where the outer code
reading foo never stops because there's new stuff being inserted ahead
of it.  But then again, you can put the insert into a stored procedure
and it does indeed do something sensible.


  Sam

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers