"Tom Lane" <[EMAIL PROTECTED]> writes:

> Dimitri Fontaine <[EMAIL PROTECTED]> writes:
>> Here's a proposal for COPY to support the T part of an ETL, that is adding 
>> the 
>> capability for COPY FROM to Transform the data it gets.
>
>> The idea is quite simple: adding to COPY FROM the option to run a function 
>> on 
>> the data before to call datatype_in functions.
>
> The major concern I have about this is to ensure that no detectable
> overhead is added to COPY when the feature isn't being used.
>
> I am not actually convinced that the column-by-column design you seem to
> have in mind is worth anything.  The examples that I remember seeing
> often involve removing columns, generating one column from multiple ones
> or vice versa, dealing with nonstandard column delimiters, etc.  What
> would makes sense in my mind is a single function taking and returning
> text, which is invoked once on each complete input line before it is
> broken into fields.

I think not having to deal with separating fields is actually one of the few
reasons to do this within COPY. If you can separate out yourself or need to do
something more clever than COPY is capable of to split the columns then you're
better off preprocessing it with perl or something anyways.

To that end all the other use cases you describe could be handled with his
plan. There's nothing stopping you from doing

CREATE READER foo (a integer, b integer)
INSERT INTO b (SELECT a+b FROM foo);
or
INSERT INTO b (SELECT 1, a, b, greatest(a,b) FROM foo)

However I'm not sure we even need new syntax for CREATE READER. I would think
something like this would make more sense:

CREATE FUNCTION transform(integer, integer) RETURNS SETOF b;

COPY b FROM 'foo' USING transform(integer,integer);

> So the whole thing seems just marginally attractive to me.

Everything about ETL is only marginally attractive, but it's something people
spend a lot of time doing. Nobody's come up with any particularly clean
solutions I think.

AFAIK the state of the art is actually to load the data into a table which
closely matches the source material, sometimes just columns of text. Then copy
it all to another table doing transformations. Not impressed.

-- 
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to