[HACKERS] May I have som syntactic sugar, please ?

Emmanuel Charpentier Mon, 10 Feb 2003 13:54:19 -0800

The recent addition of drop column makes possible to "change" column type. E. g. : I have written a small utility reading Medline queries results and dumping them in a Postgres database for further analysis.

[ Two remarks before starting ; 1) I *know* that I'm reinventing the wheel and that you can get those results in a nice XML format ; but the problem is that I do not (yet) know XLM, and have no time writing and using an XML parser, while I know how to parse a simple line-oriented format ; and 2) no, the possible fields are not a known fixed set : Medline has alreadu changer it's formats thrice in the las two years, so the reading routines have to be dynamic. Alas ... ]

These results are a "field-by-field" enumeration. i.e eachrecord is a set of <tag> - <Value> pairs (well, not exactly : some fields have more than one line, but I'm scketchy ...). Some fields may be missing, some other may be repeated.

The obvious solution is to create a temporary table with a set of keys identifying record, tag and file source and a "value" column, which has better to be of "text" type.

It is almost trivial to create (programactically) the structure of the main table (a record identifier, one or two columns identifying the source, an a column for each field present at most once in the DB) and the auxilliary tables containing repeated fields.

But this table has initially all values "text", which is a) Horribly inefficient, b) hard to use from outer programs (M$ tools will choke on an indexed text field) and c) quite inadapted to certain queries (e. g. comparisons of dates.

Furthermore, this kind of modifications cannot always be made t read-time. It happens that only the *semantics* of the problem at hand allows to choose the "right* column type. In my example, the date of last Medline indexing is formally a timestamp ; but for all practical purposes, a date is more than sufficient.

"drop columns" allows to fix this dynmically, along the line of "alter table x add column new_y newtype; update x set new_y=y::newtype, alter table x drop column y; alter table x rename column new_y to y;". This has two side effects :

The first one is unimportant (at least from a relational point of view) : the order of columns in the table is lost.

The other one is *most* important : any Postgres object using the column definition (I'm thinking of indexes, views and rules but there might be others) will have to be redefined.

What I'm thinking of would be something along the line of "alter table x alter column y retype to newtype", doing essentially the same as above, but *keeping the old column id* and *propagating* changes as needed in defined view and indices (maybe with a "cascade" option, deleting unupdtable objects as needed ?).

Am I dreaming awake ? Or is this implementable ? An sufficiently "generally useful" ?

Comments ?

Emmanuel Charpentier
--
Emmanuel Charpentier

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

[HACKERS] May I have som syntactic sugar, please ?

Reply via email to