On 5/8/13 12:54 PM, Jonathan S. Katz wrote:
On May 8, 2013, at 1:16 PM, Tom Lane wrote:

Heikki Linnakangas <hlinnakan...@vmware.com> writes:
On 08.05.2013 19:44, Tom Lane wrote:
No there isn't; what you suggest would require FE/BE protocol
extensions, making it several orders of magnitude more work than the
other thing.

I'd imagine that the flow would go something like this:

BE      FE

CopyInResponse
        CopyData
        CopyData
        ...
        CopyDone
RowDescription
DataRow
DataRow
CommandComplete

That would require the backend to buffer the entire query response,
which isn't a great idea.  I would expect that such an operation would
need to interleave CopyData to the backend with DataRow responses.  Such
a thing could possibly be built on COPY_BOTH mode, but it would be a lot
of work (at both ends) for extremely debatable value.

The general idea of COPY is to load data as fast as possible, so weighing
it down with processing options seems like a pretty dubious idea even if
the implementation were easy.

There are cases that I indeed want to load data very quickly, but I want to 
perform an operation on it immediately after, e.g. removing bad data that was 
immediately added from that copy.  For instance, I do have this scenario:

WITH new_data AS (
        COPY FROM ...
        RETURNING id, field_to_check
)
DELETE FROM table
USING new_data
WHERE
        table.id = new_data.id AND
        new_data.field_to_check ~* 'bad data';

Now I can take care of that all in one step, and I know I'm only removing 
fields I just added.  This comes up when I am importing external files from 
other sources where I may not necessarily want all of the rows or some of the 
rows contain bad data.

This also presumes that COPY works in a CTE, which I'm not sure it does (and I 
will do the TIAS test after I hit send on this message).

What you're really asking for here is some kind of stream processing 
capability. There are spin-offs of Postgres that provide that capability (I 
know Neil Conway worked on some). Those are geared completely around stream 
processing, but I think it would be extremely interesting to provide some 
minimal support for that in community Postgres.

Using your use case as an example, something like this would be very 
interesting:

COPY table FROM ...
  WHERE field_to_check !~* 'bad data'
;

In this case we're just applying a simple WHERE clause against each incoming 
row.

Perhaps what I'm suggesting could be implemented with a CTE, but I'm not sure 
it makes sense to do it the way you propose, at least not initially. A CTE 
would provide so much flexibility that it'd be difficult for the optimizer to 
be efficient about it. Something like a WHERE clause directly on COPY would be 
a lot easier to handle. As someone mentioned, FDW might be another option there.
--
Jim C. Nasby, Data Architect                       j...@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to