Re: [HACKERS] Internal design of MERGE, with Rules
On Wed, 2008-04-30 at 16:58 +0100, Simon Riggs wrote: The main query will then look like this select target.ctid ,case when-not-matched (as above) ,case when-matched (as above) ,(all other columns required for side queries) from source-query left outer join target on join-condition where (when-matched-condition-0 or when-matched-condition-1 ... or when-matched-condition-N) or (when-not-matched-condition-0 or when-not-matched-condition-1 ... or when-not-matched-condition-N) The WHERE clause is likely required in case we get queries like this MERGE target t USING (select * from source) s ON (s.pkey = t.pkey) WHEN MATCHED AND s.pkey = $1 UPDATE SET col = $2; which would be perfectly valid, even if we might hope that they had coded like this MERGE target USING (select * from source WHERE index-column = $1) ON (join-condition) WHEN MATCHED UPDATE SET col = $2; Peter has just jogged my memory about double evaluation of volatile functions, so the above transformation isn't correct. We would not be able to fully optimise a MERGE statement like this MERGE target t USING (select * from source) s ON (s.pkey = t.pkey) WHEN MATCHED AND s.key = $1 UPDATE SET col = $2; since we won't be able to pass the clause s.pkey = $1 down into the s query so it would use an index. The following statement will be faster, but will in all cases give an identical result: MERGE target t USING (select * from source WHERE key = $1) s ON (s.pkey = t.pkey) WHEN MATCHED UPDATE SET col = $2; I don't think its too important, since the latter is the way people would have used MERGE in SQL:2003 anyway. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Internal design of MERGE, with Rules
On Thu, 2008-05-01 at 00:26 +0100, Sam Mason wrote: On Wed, Apr 30, 2008 at 04:58:52PM +0100, Simon Riggs wrote: That means we probably need to introduce new infrastructure in the tcop or executor modules to handle queries-within-queries. This isn't special-casing MERGE so much as introducing infrastructure for a new class of query, such as MERGE, REPLACE, INSERT ELSE UPDATE. (Merge itself does cover almost all cases of this type of query, but we'd be able to fairly easily support all of the different syntax). MERGE would then be represented by a query that has many side queries (so called so we don't confused calling them sub-queries). Why make them special cases? (I'm sure there's a good reason!) Well, I'm not making them special cases. The infrastructure would be generalised for any statement type that wanted to do roughly this. I've sometimes wanted to be able to put DML statements inside SELECT statements. The following is a slightly reasonable example: INSERT INTO ilog (i,ts,n) SELECT i, now(), COUNT(*) FROM ( INSERT INTO bar (x,y) SELECT 5, n FROM baz WHERE i = 10 RETURNING i) x GROUP BY i; OK, but that's not what I'm working on... useful as it sounds. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Internal design of MERGE, with Rules
MERGE looks like it may need some new infrastructure to support it, depending upon the implementation route. Guidance and discussion requested from main hackers, if possible. This is a separate post because there's no further discussion here on the external behaviour of MERGE, or its concurrency/locking. Assumptions --- * MERGE must be a parameterisable, preparable command. The parameters might be anywhere in the statement i.e. in the USING clause, or in the action statements, or both. * MERGE must work like any other statement, e.g. multi-statement requests must work also e.g. UPDATE...; MERGE ... ;INSERT ... * MERGE works on tables only, not views. PostgreSQL doesn't support updateable views, so this isn't a problem to solve as part of this implementation. * MERGE contains possibly multiple INSERTs, UPDATEs and DELETEs, each of which can invoke a rule to generate even more commands. * We would not add MERGE to the list of possible events that invoke a Rule. (The data changing events would remain as INSERT, UPDATE, DELETE). However, a MERGE statement might be used *within* a Rule, giving a tree of plans rather than a single plan or a list of plans. (I can't see any way or any reason to say no allowed to that...) Implications To fully support prepared queries, MERGE needs to be a query not a utility command as I had originally envisaged. MERGE internally consists of one driving query that is always executed in full to retrieve all tuples generated, plus a number of other queries that may or may not get executed according to the evaluation of the WHEN clauses. Rule re-writing happens at the plan level. That means we need to recognise MERGE as being a new form of query: a query that has multiple other optional queries buried within it, each of which can be independently re-written. Only by doing that can we apply the rules correctly to the various parts of MERGE. That means we probably need to introduce new infrastructure in the tcop or executor modules to handle queries-within-queries. This isn't special-casing MERGE so much as introducing infrastructure for a new class of query, such as MERGE, REPLACE, INSERT ELSE UPDATE. (Merge itself does cover almost all cases of this type of query, but we'd be able to fairly easily support all of the different syntax). MERGE would then be represented by a query that has many side queries (so called so we don't confused calling them sub-queries). Each side query can be rewritten into a list of queries. Any of those queries could be a MERGE itself, so we might make each Query a tree of Queries. The main query and the side queries are related, in that the values used for the side queries come from the main query. The side queries' external references will be replaced by parameters, in addition to other parameters already supplied. So the side queries are parameterised, even if the MERGE statement wasn't parameterised. The main driving query will be executed once, the side queries once per row (if at all). Let's look back at the infrastructure aspects. My options so far for implementation are 0. Write MERGE in PL/pgSQL. Err, no. 1. We treat the MERGE similar to an Update-with-conditional-effects and have the MERGE's main plan deliver tuples that we then act on during ExecutePlan() using a new ExecMerge() function. We would preserve the text of the side queries and execute them using SPI from within ExecMerge(). That takes care of the rule rewriting, since we effectively don't know what's happening inside SPI. Feels ugly, is all I can say, though actually very similar to the way RI works. 2. Similar to (1), but instead of using SPI we start another portal from within ExecMerge() function. Portals within portals. Seems very likely for something to break in an unexpected way. 3. We treat a MERGE as a new kind of portal query. Any query containing a MERGE will be forced to be a PORTAL_MULTI_QUERY. We then introduce a new query processing routine for MERGE, ProcessQueryWithActions() rather than running ProcessQuery() during PortalRunMulti(). This can then execute the merge-query similarly to a cursor, retrieving tuples that are then fed to the side queries that then run in separate portals. Doing it this way will mean that we don't need to run the executor recursively, we just switch from main query to retrieve next row and then execute the appropriate side query. The executor level calls will look almost exactly as it would if we wrote MERGE in PL/pgSQL - fetching from a main cursor and then executing side queries as appropriate. The hacking will have similar-ish effects to that induced by the changes for RETURNING. Something like ExecutorStart(mainQueryDesc, 0); for (;;) { /* FETCH 1 */ ExecutorRun(mainQueryDesc, ForwardScanDirection, 1L); if (mainQueryDesc-estate-es_processed == 0) break; //identify side query
Re: [HACKERS] Internal design of MERGE, with Rules
On Wed, Apr 30, 2008 at 04:58:52PM +0100, Simon Riggs wrote: That means we probably need to introduce new infrastructure in the tcop or executor modules to handle queries-within-queries. This isn't special-casing MERGE so much as introducing infrastructure for a new class of query, such as MERGE, REPLACE, INSERT ELSE UPDATE. (Merge itself does cover almost all cases of this type of query, but we'd be able to fairly easily support all of the different syntax). MERGE would then be represented by a query that has many side queries (so called so we don't confused calling them sub-queries). Why make them special cases? (I'm sure there's a good reason!) I've sometimes wanted to be able to put DML statements inside SELECT statements. The following is a slightly reasonable example: INSERT INTO ilog (i,ts,n) SELECT i, now(), COUNT(*) FROM ( INSERT INTO bar (x,y) SELECT 5, n FROM baz WHERE i = 10 RETURNING i) x GROUP BY i; Hum, that implies a nasty schema (ilog.ts should be in bar). Ah well, I hope you get the idea. The simplest example I can think of is: SELECT * FROM (INSERT INTO foo (n) VALUES (1) RETURNING 1) x; Hum, I think I may have just thought of why. What would: SELECT (INSERT INTO foo (n) VALUES (f.n) RETURNING 1) FROM foo f; be expected to do? I'm thinking of nasty cases where the outer code reading foo never stops because there's new stuff being inserted ahead of it. But then again, you can put the insert into a stored procedure and it does indeed do something sensible. Sam -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers