On Thu, Mar 18, 2004 at 10:16:36AM -0500, Tom Lane wrote:
> Passing in a relation OID is probably a bad idea anyway, as it ties this
> API to the assumption that COPY is only for complete relations.  There's
> been talk before of allowing a SELECT result to be presented via the
> COPY protocol, for instance.  What might be a more usable API is
> 
> COPY OUT:
>               function formatter_out(text[]) returns text
> COPY IN:
>               function formatter_in(text) returns text[]
> 
> where the text array is either the results of or the input to the
> per-column datatype I/O routines.  This makes it explicit that the
> formatter's job is solely to determine the column-level wrapping and
> unwrapping of the data.  I'm assuming here that there is no good reason
> for the formatter to care about the specific datatypes involved; can you
> give a counterexample?

 The idea was put maximum information about tuple to formatter, and what
 will formatter do with this information is a formatter problem.

> >  It's pity  that main idea of  current COPY is based  on separated lines
> >  and it is not more common interface for streaming data between FE and BE.
> 
> Yeah, that was another concern I had.  This API would let the formatter
> control line-level layout but it would not eliminate the hard-wired
> significance of newline.  What's worse, there isn't any clean way to
> deal with reading quoted newlines --- the formatter can't really replace
> the default quoting rules if the low-level code is going to decide
> whether a newline is quoted or not.

 I think latest  protocol version works with blocks of  data and no with
 lines and client PQputCopyData() returns a block -- only docs says that
 it is row of table.

> We could possibly solve that by specifying that the text output or input
> (respectively) is the complete line sent to or from the client,
> including newline or whatever other line-level formatting you are using.
> This still leaves the problem of how the low-level COPY IN code knows
> what is a complete line to pass off to the formatter_in routine.  We
> could possibly fix this by adding a second input-control routine
> 
>       function formatter_linelength(text) returns integer
> 
> which is defined to return -1 if the input isn't a complete line yet

 But  formatter_linelength()  will  need   some  context  information  I
 think. The others  words some  struct with formatter  specific internal
 data. And  for more  difficult formats  like XML  you need  some others
 context data (parser data) too.

 Maybe there can be some global  exported struct (like for triggers) and
 functions that is written in C  can use it. It means for simple formats
 like CSV you can  use non-C functions and for formats  like XML you can
 use C functions. And  if it will intereting for PL  developers they can
 add support for access to this structs to their languages.

> (i.e., read some more data, append to the buffer, and try again), or
> >= 0 to indicate that the first N bytes of the buffer represent a
> complete line to be passed off to formatter_in.  I don't see a way to
> combine formatter_in and formatter_linelength into a single function
> without relying on "out" parameters, which would again confine the
> feature to format functions written in C.

> It's a tad annoying that we need two functions for input.  One way that
> we could still keep the COPY option syntax to be just
>       FORMAT csv
> is to create an arbitrary difference in the signatures of the input
> functions.  Then we could have coexisting functions
>       csv(text[]) returns text
>       csv(text) returns text[]
>       csv(text, ...) returns int
> that are referenced by "FORMAT csv".

 It sounds good, but I think we  both not full sure about it now, right?
 CSV support will probably better add by DELIMITER extension.

    Karel

-- 
 Karel Zak  <[EMAIL PROTECTED]>
 http://home.zf.jcu.cz/~zakkr/

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Reply via email to