On Mon, 15 Oct 2012 10:21:18 -0700, Robert Haas <robertmh...@gmail.com> wrote:
> Yeah, I think the functionality that we need is pretty much there
> already today.  What we need to do is to get the syntax to a point
> where people can write the code they want to write without getting
> tangled up by it.
> 
> I think the invention of DO was a big step in the right direction
> ...
> With DO, you can write the logic you want
> as an SQL statement, it's just a clunky and awkward SQL statement.  In
> my view the goal ought to be to refine that mechanism to remove the
> clunkiness and awkwardness, rather than to invent something completely
> new.

As someone who has worked with a number of databases now, none of them really 
get this DDL integration completely right.  What I'd like to see is 1) a 
predicate to easily test things about the schema (does this table, column, 
index, schema, etc. exist?  does it have the right type?) and 2) a way to 
conditionally execute DDL (and DML, which should fall right out, but it isn't 
really what this discussion is covering).  I would propose extending the 
current EXISTS / NOT EXISTS predicate as follows:

[NOT] EXISTS TABLE tab [ ( col [type] [ , col [type]]... ) ]
[NOT] EXISTS COLUMN tab.col [type] 
[NOT] EXISTS INDEX tab.idx [ ( col [ , col]... ) ]
[NOT] EXISTS CONSTRAINT tab.cstrt -- not sure what else might be useful here
[NOT] EXISTS ( select ) -- this is the current EXISTS predicate, of course
[NOT] EXISTS ANY ROWS FROM tab [join] [WHERE predicate] [GROUP BY col [ , 
col]...] -- exactly the same as 
                                                                                
   -- (select 1 FROM etc.)
                                                                                
   -- only because I like 
                                                                                
   -- it better
(the latter [which by no means am I nuts over; it's just that when extending 
EXISTS I can't stop neatening it up to my personal preferences] could be 
extended with [NOT] EXISTS MORE THAN n ROWS FROM and [NOT] EXISTS EXACTLY n 
ROWS FROM.)

There is a new SQL statement: IF predicate true-statement [ELSE 
false-statement].

To actually execute this new IF statement, the executor would need an IF node 
that evaluates the predicate (with ANDs and ORs, just like all SQL predicates) 
and then executes the rest only if the predicate came out TRUE (or NOT FALSE; I 
forget which is usually used, and the difference with NULL could be useful, as 
long as it matches other predicates).  This moves one more bit of procedural 
logic into the executor.  

Another wrinkle is that the dependent statement might not compile, due to 
missing tables or whatnot.  Actually executing it while it doesn't compile is 
an error, but we want to defer that error until we actually decide we need to 
execute it.  Also, it's probably good to try compiling it again at that point.  
So my thought would be to try planning the dependent statement(s); if they 
compile, hook them to the IF node; if not, hook a DEFERRED node to the IF node. 
 The DEFERRED node has the parse tree (or raw string, whatever makes sense) of 
the statement; on execution it tries again to plan that statement; if it 
succeeds, run it; if not, error out.

I'd also add a SEQUENCE node to the executor.  It just runs its children in 
order (could be n-ary, or if fixed arity nodes are what is in the 
planner/executor today, could be binary, first left, then right, and right 
could be another SEQUENCE).  The DEFERRED node means that a CREATE statement 
could precede use of what is created in the same sequence and all could get 
planned (with some deferral to execution time) in advance and run in one lump.  
This implements DO at the executor level.

The biggest concepts left from plpgsql are looping and variables.  Most 
variables could be modeled as a single row value; SQL already can update a row, 
so the planning of assignments and calculations of scalars (and arrays, I 
think) already fits into things the planner knows about.  Table variables 
(which I don't know that plpgsql supports, but someday it should) are less 
defined.  Adding plpgsql's loops to the executor would let whole functions run 
under one trip through the executor.  This is beyond just improving the DDL 
support for scripts.

I have written a number of database upgrade scripts.  Over time we've made them 
less fragile, by checking for the existence of tables, indexes, and most 
recently, columns.  The usual sequence is:
1) check the existence of an index; check that the first few columns are 
correct; if not, drop the index
2) repeat for other indexes that have changed definition over time
3) check the existence of the table; create with current layout if it is missing
4) check for the presence of a column; if missing, alter the table to add it 
(of course, we can only add new columns at the end, and occasionally delete a 
column)
5) repeat for more columns
6) check the existence of an index; if missing, create it
7) repeat for all the indexes
This is doable in most databases, but pretty messy.  You need to join with 
infoschema tables, or system tables, or use clunky functions to check for 
existence; checking types is usually pretty horrid.  Consequently, we only 
check a few things and trust that the schema is only in a few different states. 
 A true schema comparator and upgrade solver would be great, but I don't know 
anyone who has written such a thing.  The extended EXISTS predicate that could 
check tables and indexes would declutter a lot of our upgrade scripts.  That's 
the use case for me.

Much as I'd like to, I'm not volunteering to write this.  And I'm not 
volunteering anyone else either, or demanding anything, or requesting or 
pleading.  I did want to write it down and send it along just to clarify my 
thoughts.  And if someday I get time, maybe I can learn enough internals to 
write a patch.  But that day isn't today....


This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please notify us immediately by e-mail and delete the message and any 
attachments from your system.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to