Re: [HACKERS] COPY as a set returning function

2017-01-31 Thread Corey Huinker
>
>
> Here is a 4: Refactoring BeginCopyFrom so as instead of a Relation are
> used a TupleDesc, a default expression list, and a relation name. You
> could as well make NextCopyFrom() smarter so as it does nothing if no
> expression contexts are given by the caller, which is the case of your
> function here. To be honest, I find a bit weird to use
> NextCopyFromRawFields when there is already a bunch of sanity checks
> happening in NextCopyFrom(), where you surely want to have them
> checked even for your function.
>
> Looking at the last patch proposed, I find the current patch proposed
> as immature, as rsTupDesc basically overlaps with the relation a
> caller can define in BeginCopyFrom(), so marking this patch as
> "returned with feedback".
> --
> Michael
>

I like that suggestion and will move forward on it. I wasn't sure there
would be support for such a refactoring.

As for the copy from stdin case, Andrew Gierth laid out why that's nearly
impossible to unwind in the network protocol (the act of starting the copy
causes the query result to end before any rows were returned), and that
might make STDIN input a dead issue.


Re: [HACKERS] COPY as a set returning function

2017-01-30 Thread Michael Paquier
On Tue, Jan 31, 2017 at 3:05 AM, Corey Huinker  wrote:
> On Fri, Jan 27, 2017 at 9:42 PM, Peter Eisentraut
>  wrote:
>>
>> On 1/25/17 8:51 PM, Corey Huinker wrote:
>> > # select * from copy_srf('echo "x\ty"',true) as t(x text, y text);
>>
>> I find these parameters weird.  Just looking at this, one has no idea
>> what the "true" means.  Why not have a "filename" and a "program"
>> parameter and make them mutually exclusive?
>
>
> It was done that way to match the parameters of BeginCopyFrom() and it could
> easily be changed.
>
> I suppose I could have written it as:
>
> select * from copy_srf(filename => 'echo "x\ty"', is_program => true) as t(x
> text, y text);
>
>
> But this goes to the core of this patch/poc: I don't know where we want to
> take it next.
>
> Options at this point are:
> 1. Continue with a SRF, in which case discussions about parameters are
> completely valid.
> 2. Add a RETURNING clause to COPY. This would dispense with the parameters
> problem, but potentially create others.
> 3. Add the TupDesc parameter to BeginCopyFrom, make sure all data structures
> are visible to an extension, and call it a day. If someone wants to write an
> extension that makes their own copy_srf(), they can.
> 4. Someone else's better idea.

Here is a 4: Refactoring BeginCopyFrom so as instead of a Relation are
used a TupleDesc, a default expression list, and a relation name. You
could as well make NextCopyFrom() smarter so as it does nothing if no
expression contexts are given by the caller, which is the case of your
function here. To be honest, I find a bit weird to use
NextCopyFromRawFields when there is already a bunch of sanity checks
happening in NextCopyFrom(), where you surely want to have them
checked even for your function.

Looking at the last patch proposed, I find the current patch proposed
as immature, as rsTupDesc basically overlaps with the relation a
caller can define in BeginCopyFrom(), so marking this patch as
"returned with feedback".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY as a set returning function

2017-01-30 Thread Corey Huinker
On Fri, Jan 27, 2017 at 9:42 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 1/25/17 8:51 PM, Corey Huinker wrote:
> > # select * from copy_srf('echo "x\ty"',true) as t(x text, y text);
>
> I find these parameters weird.  Just looking at this, one has no idea
> what the "true" means.  Why not have a "filename" and a "program"
> parameter and make them mutually exclusive?


It was done that way to match the parameters of BeginCopyFrom() and it
could easily be changed.

I suppose I could have written it as:

select * from copy_srf(filename => 'echo "x\ty"', is_program => true) as
t(x text, y text);


But this goes to the core of this patch/poc: I don't know where we want to
take it next.

Options at this point are:
1. Continue with a SRF, in which case discussions about parameters are
completely valid.
2. Add a RETURNING clause to COPY. This would dispense with the parameters
problem, but potentially create others.
3. Add the TupDesc parameter to BeginCopyFrom, make sure all data
structures are visible to an extension, and call it a day. If someone wants
to write an extension that makes their own copy_srf(), they can.
4. Someone else's better idea.


Re: [HACKERS] COPY as a set returning function

2017-01-27 Thread Peter Eisentraut
On 1/27/17 8:07 PM, David Fetter wrote:
> There are still neither regression tests nor SGML documentation.
> 
> Are we at a point where we should add these things?

One could argue about the documentation at this point, since the
function is somewhat self-documenting for the time being.  But surely
there should be tests.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY as a set returning function

2017-01-27 Thread Peter Eisentraut
On 1/25/17 8:51 PM, Corey Huinker wrote:
> # select * from copy_srf('echo "x\ty"',true) as t(x text, y text);

I find these parameters weird.  Just looking at this, one has no idea
what the "true" means.  Why not have a "filename" and a "program"
parameter and make them mutually exclusive?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY as a set returning function

2017-01-27 Thread David Fetter
On Wed, Jan 25, 2017 at 08:51:38PM -0500, Corey Huinker wrote:
> I've put in some more work on this patch, mostly just taking Alvaro's
> suggestions, which resulted in big code savings.

The patch applies atop master.

The test (ls) which previously crashed the backend now doesn't, and
works in a reasonable way.

The description of the function still talks about its being a proof of
concept.

There are still neither regression tests nor SGML documentation.

Are we at a point where we should add these things?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY as a set returning function

2017-01-25 Thread Corey Huinker
>
>
>> I don't understand why do we have all these checks.  Can't we just pass
>> the values to COPY and let it apply the checks?  That way, when COPY is
>> updated to support multibyte escape chars (for example) we don't need to
>> touch this code.  Together with removing the unneeded braces that would
>> make these stanzas about six lines long instead of fifteen.
>>
>
> If I understand you correctly, COPY (via BeginCopyFrom) itself relies on
> having a relation in pg_class to reference for attributes.
> In this case, there is no such relation. So I'd have to fake a relcache
> entry, or refactor BeginCopyFrom() to extract a ReturnSetInfo from the
> Relation and pass that along to a new function BeginCopyFromReturnSet. I'm
> happy to go that route if you think it's a good idea.
>
>
>>
>>
>> > + tuple = heap_form_tuple(tupdesc,values,nulls);
>> > + //tuple = BuildTupleFromCStrings(attinmeta,
>> field_strings);
>> > + tuplestore_puttuple(tupstore, tuple);
>>
>> No need to form a tuple; use tuplestore_putvalues here.
>>
>
> Good to know!
>
>
>
>>
>> I wonder if these should be an auxiliary function in copy.c to do this.
>> Surely copy.c itself does pretty much the same thing ...
>>
>
> Yes. This got started as a patch to core because not all of the parts of
> COPY are externally callable, and aren't broken down in a way that allowed
> for use in a SRF.
>
> I'll get to work on these suggestions.
>

I've put in some more work on this patch, mostly just taking Alvaro's
suggestions, which resulted in big code savings.

I had to add a TupleDesc parameter to BeginCopy() and BeginCopyFrom(). This
seemed the easiest way to leverage the existing tested code (and indeed, it
worked nearly out-of-the-box). The only drawback is that a minor change
will have to be made to the BeginCopyFrom() call in file_fdw.c, and any
other extensions that leverage COPY. We could make compatibility functions
that take the original signature and pass it along to the corresponding
function with rsTupDesc set to NULL.

Some issues:
- I'm still not sure if the direction we want to go is a set returning
function, or a change in grammar that lets us use COPY as a CTE or similar.
- This function will have the same difficulties as adding the program
option did to file_fdw: there's very little we can reference that isn't
os/environment specific
- Inline (STDIN) prompts the user for input, but gives the error: server
sent data ("D" message) without prior row description ("T" message). I
looked for a place where the Relation was consulted for the row
description, but I'm not finding it.

I can continue to flesh this out with documentation and test cases if there
is consensus that this is the way to go.


# select * from copy_srf('echo "x\ty"',true) as t(x text, y text);
 x | y
---+---
 x | y
(1 row)

Time: 1.074 ms
# select * from copy_srf('echo "x\t4"',true) as t(x text, y integer);
 x | y
---+---
 x | 4
(1 row)

Time: 1.095 ms
# select * from copy_srf(null) as t(x text, y integer);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> a4
>> b5
>> \.
server sent data ("D" message) without prior row description ("T" message)
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 4dfedf8..26f81f3 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1065,6 +1065,21 @@ LANGUAGE INTERNAL
 STRICT IMMUTABLE PARALLEL SAFE
 AS 'jsonb_insert';
 
+CREATE OR REPLACE FUNCTION copy_srf(
+   IN filename text,
+   IN is_program boolean DEFAULT false,
+   IN format text DEFAULT null,
+   IN delimiter text DEFAULT null,
+   IN null_string text DEFAULT null,
+   IN header boolean DEFAULT null,
+   IN quote text DEFAULT null,
+   IN escape text DEFAULT null,
+   IN encoding text DEFAULT null)
+RETURNS SETOF RECORD
+LANGUAGE INTERNAL
+VOLATILE ROWS 1000 COST 1000 CALLED ON NULL INPUT
+AS 'copy_srf';
+
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
 -- than use explicit 'superuser()' checks in those functions, we use the GRANT
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f9362be..4e6a32c 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -30,6 +30,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -286,7 +287,8 @@ static const char BinarySignature[11] = 
"PGCOPY\n\377\r\n\0";
 
 
 /* non-export function prototypes */
-static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
+static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel, 
+ TupleDesc rsTupDesc,
  RawStmt *raw_query, 

Re: [HACKERS] COPY as a set returning function

2017-01-25 Thread Corey Huinker
>
> > + /* param 7: escape text default null, -- defaults to whatever
> quote is */
> > + if (PG_ARGISNULL(7))
> > + {
> > + copy_state.escape = copy_state.quote;
> > + }
> > + else
> > + {
> > + if (copy_state.csv_mode)
> > + {
> > + copy_state.escape = TextDatumGetCString(PG_GETARG_
> TEXT_P(7));
> > + if (strlen(copy_state.escape) != 1)
> > + {
> > + ereport(ERROR,
> > +
>  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > +  errmsg("COPY escape must
> be a single one-byte character")));
> > + }
> > + }
> > + else
> > + {
> > + ereport(ERROR,
> > + (errcode(ERRCODE_FEATURE_NOT_
> SUPPORTED),
> > +  errmsg("COPY escape available
> only in CSV mode")));
> > + }
> > + }
>
> I don't understand why do we have all these checks.  Can't we just pass
> the values to COPY and let it apply the checks?  That way, when COPY is
> updated to support multibyte escape chars (for example) we don't need to
> touch this code.  Together with removing the unneeded braces that would
> make these stanzas about six lines long instead of fifteen.
>

If I understand you correctly, COPY (via BeginCopyFrom) itself relies on
having a relation in pg_class to reference for attributes.
In this case, there is no such relation. So I'd have to fake a relcache
entry, or refactor BeginCopyFrom() to extract a ReturnSetInfo from the
Relation and pass that along to a new function BeginCopyFromReturnSet. I'm
happy to go that route if you think it's a good idea.


>
>
> > + tuple = heap_form_tuple(tupdesc,values,nulls);
> > + //tuple = BuildTupleFromCStrings(attinmeta,
> field_strings);
> > + tuplestore_puttuple(tupstore, tuple);
>
> No need to form a tuple; use tuplestore_putvalues here.
>

Good to know!



> > + }
> > +
> > + /* close "file" */
> > + if (copy_state.is_program)
> > + {
> > + int pclose_rc;
> > +
> > + pclose_rc = ClosePipeStream(copy_state.copy_file);
> > + if (pclose_rc == -1)
> > + ereport(ERROR,
> > + (errcode_for_file_access(),
> > +  errmsg("could not close pipe to
> external command: %m")));
> > + else if (pclose_rc != 0)
> > + ereport(ERROR,
> > + (errcode(ERRCODE_EXTERNAL_
> ROUTINE_EXCEPTION),
> > +  errmsg("program \"%s\" failed",
> > +
>  copy_state.filename),
> > +  errdetail_internal("%s",
> wait_result_to_str(pclose_rc;
> > + }
> > + else
> > + {
> > + if (copy_state.filename != NULL &&
> FreeFile(copy_state.copy_file))
> > + ereport(ERROR,
> > + (errcode_for_file_access(),
> > +  errmsg("could not close file
> \"%s\": %m",
> > +
>  copy_state.filename)));
> > + }
>
> I wonder if these should be an auxiliary function in copy.c to do this.
> Surely copy.c itself does pretty much the same thing ...
>

Yes. This got started as a patch to core because not all of the parts of
COPY are externally callable, and aren't broken down in a way that allowed
for use in a SRF.

I'll get to work on these suggestions.


Re: [HACKERS] COPY as a set returning function

2017-01-25 Thread Corey Huinker
On Wed, Jan 25, 2017 at 1:10 PM, David Fetter  wrote:

> On Wed, Jan 25, 2017 at 12:23:23PM -0500, Corey Huinker wrote:
> >
> > Feel free to mark it returned as feedback. The concept didn't
> > generate as much enthusiasm as I had hoped, so I think the right
> > thing to do now is scale it back to a patch that makes
> > CopyFromRawFields() externally visible so that extensions can use
> > them.
>
> You're getting enthusiasm in the form of these reviews and suggestions
> for improvement.  That it doesn't always happen immediately is a
> byproduct of the scarcity of developer time and the sheer volume of
> things to which people need to pay attention.


True about that. I was referring to "ooh! I need that!"-type interest. I'll
proceed, keeping in mind that there's a fallback position of just making
some of the guts of COPY available to be called by extensions like was done
for file_fdw.


Re: [HACKERS] COPY as a set returning function

2017-01-25 Thread David Fetter
On Wed, Jan 25, 2017 at 12:23:23PM -0500, Corey Huinker wrote:
> 
> Feel free to mark it returned as feedback. The concept didn't
> generate as much enthusiasm as I had hoped, so I think the right
> thing to do now is scale it back to a patch that makes
> CopyFromRawFields() externally visible so that extensions can use
> them.

You're getting enthusiasm in the form of these reviews and suggestions
for improvement.  That it doesn't always happen immediately is a
byproduct of the scarcity of developer time and the sheer volume of
things to which people need to pay attention.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY as a set returning function

2017-01-25 Thread Corey Huinker
On Wed, Jan 25, 2017 at 11:57 AM, Alvaro Herrera 
wrote:

> David Fetter wrote:
>
> > @@ -562,7 +563,6 @@ CopyGetData(CopyState cstate, void *databuf, int
> minread, int maxread)
> >errmsg("could not read
> from COPY file: %m")));
> >   break;
> >   case COPY_OLD_FE:
> > -
> >   /*
> >* We cannot read more than minread bytes (which
> in practice is 1)
> >* because old protocol doesn't have any clear way
> of separating
>
> This change is pointless as it'd be undone by pgindent.
>
> > + /*
> > +  * Function signature is:
> > +  * copy_srf( filename text default null,
> > +  *   is_program boolean default false,
> > +  *   format text default 'text',
> > +  *   delimiter text default E'\t' in text mode, ',' in csv
> mode,
> > +  *   null_string text default '\N',
> > +  *   header boolean default false,
> > +  *   quote text default '"' in csv mode only,
> > +  *   escape text default null, -- defaults to whatever
> quote is
> > +  *   encoding text default null
> > +  */
>
> This comment would be mangled by pgindent -- please add an /*--- marker
> to prevent it.
>
> > + /* param 7: escape text default null, -- defaults to whatever
> quote is */
> > + if (PG_ARGISNULL(7))
> > + {
> > + copy_state.escape = copy_state.quote;
> > + }
> > + else
> > + {
> > + if (copy_state.csv_mode)
> > + {
> > + copy_state.escape = TextDatumGetCString(PG_GETARG_
> TEXT_P(7));
> > + if (strlen(copy_state.escape) != 1)
> > + {
> > + ereport(ERROR,
> > +
>  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > +  errmsg("COPY escape must
> be a single one-byte character")));
> > + }
> > + }
> > + else
> > + {
> > + ereport(ERROR,
> > + (errcode(ERRCODE_FEATURE_NOT_
> SUPPORTED),
> > +  errmsg("COPY escape available
> only in CSV mode")));
> > + }
> > + }
>
> I don't understand why do we have all these checks.  Can't we just pass
> the values to COPY and let it apply the checks?  That way, when COPY is
> updated to support multibyte escape chars (for example) we don't need to
> touch this code.  Together with removing the unneeded braces that would
> make these stanzas about six lines long instead of fifteen.
>
>
> > + tuple = heap_form_tuple(tupdesc,values,nulls);
> > + //tuple = BuildTupleFromCStrings(attinmeta,
> field_strings);
> > + tuplestore_puttuple(tupstore, tuple);
>
> No need to form a tuple; use tuplestore_putvalues here.
>
>
> > + }
> > +
> > + /* close "file" */
> > + if (copy_state.is_program)
> > + {
> > + int pclose_rc;
> > +
> > + pclose_rc = ClosePipeStream(copy_state.copy_file);
> > + if (pclose_rc == -1)
> > + ereport(ERROR,
> > + (errcode_for_file_access(),
> > +  errmsg("could not close pipe to
> external command: %m")));
> > + else if (pclose_rc != 0)
> > + ereport(ERROR,
> > + (errcode(ERRCODE_EXTERNAL_
> ROUTINE_EXCEPTION),
> > +  errmsg("program \"%s\" failed",
> > +
>  copy_state.filename),
> > +  errdetail_internal("%s",
> wait_result_to_str(pclose_rc;
> > + }
> > + else
> > + {
> > + if (copy_state.filename != NULL &&
> FreeFile(copy_state.copy_file))
> > + ereport(ERROR,
> > + (errcode_for_file_access(),
> > +  errmsg("could not close file
> \"%s\": %m",
> > +
>  copy_state.filename)));
> > + }
>
> I wonder if these should be an auxiliary function in copy.c to do this.
> Surely copy.c itself does pretty much the same thing ...
>
>
> > +DATA(insert OID = 3353 (  copy_srf PGNSP PGUID 12 1 0 0 0 f f f f f t v
> u 9 0 2249 "25 16 25 25 25 16 25 25 25" _null_ _null_ _null_ _null_ _null_
> copy_srf _null_ _null_ _null_ ));
> > +DESCR("set-returning COPY proof of concept");
>
> Why is this marked "proof of concept"?  If this is a PoC only, why are
> you submitting it as a patch?
>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Feel free to mark it returned as feedback. The concept didn't generate as
much enthusiasm as I had hoped, so I think the 

Re: [HACKERS] COPY as a set returning function

2017-01-25 Thread Alvaro Herrera
David Fetter wrote:

> @@ -562,7 +563,6 @@ CopyGetData(CopyState cstate, void *databuf, int minread, 
> int maxread)
>errmsg("could not read from 
> COPY file: %m")));
>   break;
>   case COPY_OLD_FE:
> -
>   /*
>* We cannot read more than minread bytes (which in 
> practice is 1)
>* because old protocol doesn't have any clear way of 
> separating

This change is pointless as it'd be undone by pgindent.

> + /*
> +  * Function signature is:
> +  * copy_srf( filename text default null,
> +  *   is_program boolean default false,
> +  *   format text default 'text',
> +  *   delimiter text default E'\t' in text mode, ',' in csv mode,
> +  *   null_string text default '\N',
> +  *   header boolean default false,
> +  *   quote text default '"' in csv mode only,
> +  *   escape text default null, -- defaults to whatever quote is
> +  *   encoding text default null
> +  */

This comment would be mangled by pgindent -- please add an /*--- marker
to prevent it.

> + /* param 7: escape text default null, -- defaults to whatever quote is 
> */
> + if (PG_ARGISNULL(7))
> + {
> + copy_state.escape = copy_state.quote;
> + }
> + else
> + {
> + if (copy_state.csv_mode)
> + {
> + copy_state.escape = 
> TextDatumGetCString(PG_GETARG_TEXT_P(7));
> + if (strlen(copy_state.escape) != 1)
> + {
> + ereport(ERROR,
> + 
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +  errmsg("COPY escape must be a 
> single one-byte character")));
> + }
> + }
> + else
> + {
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +  errmsg("COPY escape available only in 
> CSV mode")));
> + }
> + }

I don't understand why do we have all these checks.  Can't we just pass
the values to COPY and let it apply the checks?  That way, when COPY is
updated to support multibyte escape chars (for example) we don't need to
touch this code.  Together with removing the unneeded braces that would
make these stanzas about six lines long instead of fifteen.


> + tuple = heap_form_tuple(tupdesc,values,nulls);
> + //tuple = BuildTupleFromCStrings(attinmeta, field_strings);
> + tuplestore_puttuple(tupstore, tuple);

No need to form a tuple; use tuplestore_putvalues here.


> + }
> +
> + /* close "file" */
> + if (copy_state.is_program)
> + {
> + int pclose_rc;
> +
> + pclose_rc = ClosePipeStream(copy_state.copy_file);
> + if (pclose_rc == -1)
> + ereport(ERROR,
> + (errcode_for_file_access(),
> +  errmsg("could not close pipe to 
> external command: %m")));
> + else if (pclose_rc != 0)
> + ereport(ERROR,
> + 
> (errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
> +  errmsg("program \"%s\" failed",
> + copy_state.filename),
> +  errdetail_internal("%s", 
> wait_result_to_str(pclose_rc;
> + }
> + else
> + {
> + if (copy_state.filename != NULL && 
> FreeFile(copy_state.copy_file))
> + ereport(ERROR,
> + (errcode_for_file_access(),
> +  errmsg("could not close file \"%s\": 
> %m",
> + copy_state.filename)));
> + }

I wonder if these should be an auxiliary function in copy.c to do this.
Surely copy.c itself does pretty much the same thing ...


> +DATA(insert OID = 3353 (  copy_srf PGNSP PGUID 12 1 0 0 0 f f f f f t v u 9 
> 0 2249 "25 16 25 25 25 16 25 25 25" _null_ _null_ _null_ _null_ _null_ 
> copy_srf _null_ _null_ _null_ ));
> +DESCR("set-returning COPY proof of concept");

Why is this marked "proof of concept"?  If this is a PoC only, why are
you submitting it as a patch?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY as a set returning function

2017-01-25 Thread David Fetter
On Wed, Jan 25, 2017 at 06:16:16AM -0800, David Fetter wrote:
> On Mon, Oct 31, 2016 at 04:45:40PM -0400, Corey Huinker wrote:
> > Attached is a patch that implements copy_srf().
> > 
> > The function signature reflects cstate more than it reflects the COPY
> > options (filename+is_program instead of  FILENAME or PROGRAM, etc)
> 
> The patch as it stands needs a rebase.  I'll see what I can do today.

Please find attached a rebase, which fixes an OID collision that crept in.

- The patch builds against master and passes "make check".
- The patch does not contain user-visible documentation or examples.
- The patch does not contain tests.

I got the following when I did a brief test.

SELECT * FROM copy_srf(filename => 'ls', is_program => true) AS t(l text);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 4dfedf8..ae07cfb 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1065,6 +1065,21 @@ LANGUAGE INTERNAL
 STRICT IMMUTABLE PARALLEL SAFE
 AS 'jsonb_insert';
 
+CREATE OR REPLACE FUNCTION copy_srf(
+   IN filename text DEFAULT null,
+   IN is_program boolean DEFAULT false,
+   IN format text DEFAULT 'text',
+   IN delimiter text DEFAULT null,
+   IN null_string text DEFAULT E'\\N',
+   IN header boolean DEFAULT false,
+   IN quote text DEFAULT null,
+   IN escape text DEFAULT null,
+   IN encoding text DEFAULT null)
+RETURNS SETOF RECORD
+LANGUAGE INTERNAL
+VOLATILE ROWS 1000 COST 1000 CALLED ON NULL INPUT
+AS 'copy_srf';
+
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
 -- than use explicit 'superuser()' checks in those functions, we use the GRANT
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f9362be..8e1bd39 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -30,6 +30,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -562,7 +563,6 @@ CopyGetData(CopyState cstate, void *databuf, int minread, 
int maxread)
 errmsg("could not read from 
COPY file: %m")));
break;
case COPY_OLD_FE:
-
/*
 * We cannot read more than minread bytes (which in 
practice is 1)
 * because old protocol doesn't have any clear way of 
separating
@@ -4740,3 +4740,377 @@ CreateCopyDestReceiver(void)
 
return (DestReceiver *) self;
 }
+
+Datum
+copy_srf(PG_FUNCTION_ARGS)
+{
+   ReturnSetInfo   *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+   TupleDesc   tupdesc;
+   Tuplestorestate *tupstore = NULL;
+   MemoryContext   per_query_ctx;
+   MemoryContext   oldcontext;
+   FmgrInfo*in_functions;
+   Oid *typioparams;
+   Oid in_func_oid;
+
+   CopyStateData   copy_state;
+   int col;
+
+   Datum   *values;
+   bool*nulls;
+
+   /* check to see if caller supports us returning a tuplestore */
+   if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+   {
+   ereport(ERROR,
+   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("set-valued function called in context 
that cannot accept a set")));
+   }
+
+   if (!(rsinfo->allowedModes & SFRM_Materialize) || rsinfo->expectedDesc 
== NULL)
+   {
+   ereport(ERROR,
+   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+errmsg("materialize mode required, but it is 
not allowed in this context")));
+   }
+
+   tupdesc = CreateTupleDescCopy(rsinfo->expectedDesc);
+   values = (Datum *) palloc(tupdesc->natts * sizeof(Datum));
+   nulls = (bool *) palloc(tupdesc->natts * sizeof(bool));
+   in_functions = (FmgrInfo *) palloc(tupdesc->natts * sizeof(FmgrInfo));
+   typioparams = (Oid *) palloc(tupdesc->natts * sizeof(Oid));
+
+   for (col = 0; col < tupdesc->natts; col++)
+   {
+   
getTypeInputInfo(tupdesc->attrs[col]->atttypid,_func_oid,[col]);
+   

Re: [HACKERS] COPY as a set returning function

2017-01-25 Thread David Fetter
On Mon, Oct 31, 2016 at 04:45:40PM -0400, Corey Huinker wrote:
> Attached is a patch that implements copy_srf().
> 
> The function signature reflects cstate more than it reflects the COPY
> options (filename+is_program instead of  FILENAME or PROGRAM, etc)

The patch as it stands needs a rebase.  I'll see what I can do today.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY as a set returning function

2017-01-25 Thread David Fetter
On Wed, Jan 25, 2017 at 02:37:57PM +0900, Michael Paquier wrote:
> On Mon, Dec 5, 2016 at 2:10 PM, Haribabu Kommi  
> wrote:
> > On Tue, Nov 1, 2016 at 7:45 AM, Corey Huinker 
> > wrote:
> >>
> >> Attached is a patch that implements copy_srf().
> >
> > Moved to next CF with "needs review" status.
> 
> This patch is still waiting for review. David, are you planning to
> look at it by the end of the CF?

I'll be doing this today.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY as a set returning function

2017-01-24 Thread Michael Paquier
On Mon, Dec 5, 2016 at 2:10 PM, Haribabu Kommi  wrote:
> On Tue, Nov 1, 2016 at 7:45 AM, Corey Huinker 
> wrote:
>>
>> Attached is a patch that implements copy_srf().
>
> Moved to next CF with "needs review" status.

This patch is still waiting for review. David, are you planning to
look at it by the end of the CF?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY as a set returning function

2016-12-04 Thread Haribabu Kommi
On Tue, Nov 1, 2016 at 7:45 AM, Corey Huinker 
wrote:

> Attached is a patch that implements copy_srf().
>

Moved to next CF with "needs review" status.


Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] COPY as a set returning function

2016-10-31 Thread Corey Huinker
Attached is a patch that implements copy_srf().

The function signature reflects cstate more than it reflects the COPY
options (filename+is_program instead of  FILENAME or PROGRAM, etc)

CREATE OR REPLACE FUNCTION copy_srf(
   filenametext DEFAULT null,
   is_program  boolean DEFAULT false,
   format  text DEFAULT 'text',  /* accepts text, csv, binary */
   delimiter   text DEFAULT null,
   null_string text DEFAULT E'\\N',
   header  boolean DEFAULT false,
   quote   text DEFAULT null,
   escape  text DEFAULT null,
   encodingtext DEFAULT null)
RETURNS SETOF RECORD

The major utility for this (at least for me) will be in ETLs that currently
make a lot of use of temp tables, and wish to either reduce I/O or avoid
pg_attribute bloat.

I have not yet implemented STDIN mode, but it's a start.

$ psql test
psql (10devel)
Type "help" for help.

# select * fromcopy_srf('echo 1,2; echo 4,5',true,'csv') as t(x text, y
text);
 x | y
---+---
 1 | 2
 4 | 5
(2 rows)

Time: 1.456 ms
# select * fromcopy_srf('/tmp/small_file.txt'::text,false,'text') as
t(x text, y text);
  x  |  y
-+-
 1   | 2
 a   | b
 cde | fgh
(3 rows)


On Mon, Oct 17, 2016 at 2:33 PM, Merlin Moncure  wrote:

> On Fri, Sep 30, 2016 at 9:56 PM, Tom Lane  wrote:
> > Craig Ringer  writes:
> >> On 1 Oct. 2016 05:20, "Tom Lane"  wrote:
> >>> I think the last of those suggestions has come up before.  It has the
> >>> large advantage that you don't have to remember a different syntax for
> >>> copy-as-a-function.
> >
> >> That sounds fantastic. It'd help this copy variant retain festure parity
> >> with normal copy. And it'd bring us closer to being able to FETCH in non
> >> queries.
> >
> > On second thought, though, this couldn't exactly duplicate the existing
> > COPY syntax, because COPY relies heavily on the rowtype of the named
> > target table to tell it what it's copying.  You'd need some new syntax
> > to provide the list of column names and types, which puts a bit of
> > a hole in the "syntax we already know" argument.  A SRF-returning-record
> > would have a leg up on that, because we do have existing syntax for
> > defining the concrete rowtype that any particular call returns.
>
> One big disadvantage of SRF-returning-record syntax is that functions
> are basically unwrappable with generic wrappers sans major gymnastics
> such as dynamically generating the query and executing it.  This is a
> major disadvantage relative to the null::type hack we use in the
> populate_record style functions and perhaps ought to make this
> (SRF-returning-record syntax) style of use discouraged for useful
> library functions.  If there were a way to handle wrapping I'd
> withdraw this minor objection -- this has come up in dblink
> discussions a few times).
>
> merlin
>
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index ada2142..0876ee1 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1006,6 +1006,21 @@ LANGUAGE INTERNAL
 STRICT IMMUTABLE PARALLEL SAFE
 AS 'jsonb_insert';
 
+CREATE OR REPLACE FUNCTION copy_srf(
+   IN filename text DEFAULT null, 
+   IN is_program boolean DEFAULT false,
+   IN format text DEFAULT 'text',
+   IN delimiter text DEFAULT null,
+   IN null_string text DEFAULT E'\\N',
+   IN header boolean DEFAULT false,
+   IN quote text DEFAULT null,
+   IN escape text DEFAULT null,
+   IN encoding text DEFAULT null)
+RETURNS SETOF RECORD
+LANGUAGE INTERNAL
+VOLATILE ROWS 1000 COST 1000 CALLED ON NULL INPUT
+AS 'copy_srf';
+
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
 -- than use explicit 'superuser()' checks in those functions, we use the GRANT
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b4140eb..90ed2c5 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -30,6 +30,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -555,7 +556,6 @@ CopyGetData(CopyState cstate, void *databuf, int minread, 
int maxread)
 errmsg("could not read from 
COPY file: %m")));
break;
case COPY_OLD_FE:
-
/*
 * We cannot read more than minread bytes (which in 
practice is 1)
 * because old protocol doesn't have any clear way of 
separating
@@ -4555,3 +4555,377 @@ CreateCopyDestReceiver(void)
 
return (DestReceiver *) self;
 }
+
+Datum
+copy_srf(PG_FUNCTION_ARGS)
+{
+   ReturnSetInfo   *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+  

Re: [HACKERS] COPY as a set returning function

2016-10-17 Thread Merlin Moncure
On Fri, Sep 30, 2016 at 9:56 PM, Tom Lane  wrote:
> Craig Ringer  writes:
>> On 1 Oct. 2016 05:20, "Tom Lane"  wrote:
>>> I think the last of those suggestions has come up before.  It has the
>>> large advantage that you don't have to remember a different syntax for
>>> copy-as-a-function.
>
>> That sounds fantastic. It'd help this copy variant retain festure parity
>> with normal copy. And it'd bring us closer to being able to FETCH in non
>> queries.
>
> On second thought, though, this couldn't exactly duplicate the existing
> COPY syntax, because COPY relies heavily on the rowtype of the named
> target table to tell it what it's copying.  You'd need some new syntax
> to provide the list of column names and types, which puts a bit of
> a hole in the "syntax we already know" argument.  A SRF-returning-record
> would have a leg up on that, because we do have existing syntax for
> defining the concrete rowtype that any particular call returns.

One big disadvantage of SRF-returning-record syntax is that functions
are basically unwrappable with generic wrappers sans major gymnastics
such as dynamically generating the query and executing it.  This is a
major disadvantage relative to the null::type hack we use in the
populate_record style functions and perhaps ought to make this
(SRF-returning-record syntax) style of use discouraged for useful
library functions.  If there were a way to handle wrapping I'd
withdraw this minor objection -- this has come up in dblink
discussions a few times).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY as a set returning function

2016-10-17 Thread Corey Huinker
On Sun, Oct 16, 2016 at 9:01 AM, Craig Ringer 
wrote:

> On 15 Oct. 2016 04:56, "Corey Huinker"  wrote:
>
> > I would like to make COPY itself a SRF. That's a bit beyond my
> capabilities, so if that is the route we want to go, I will need help.
> >
> > The syntax would probably look like this (new bits in bold):
> >
> >> WITH my_copy  AS (
> >> COPY FROM 'example.csv' TO RESULT SET(c1 text, c2 integer, dummy1
> text, dummy2 text, c5 date) WITH (FORMAT CSV)
> >> RETURNING c1, c2, c3
> >> )
>
> Strong -1 from me on this approach. Our CTE implementation materializes
> everything so this is no better than COPYing to a temp table.
>
> Not unless you plan to fix that (and figure out the backward compatibility
> issues since the bug is documented as a feature) or implement RETURNING in
> subqueries... I'd go for the function.
>

Well, it saves burning the oid and the pg_attribute rows. A few long
running transactions can cause pg_attribute to bloat to 400GB on one of our
systems - hence my wanting something like this function.

If it does stay a function, we only need to implement 8 of the 12 options
as parameters (FREEZE and FORCE* options don't apply). My guess is that
future options added to COPY will be more about handling output or
optimizing table inserts, neither of which mean more options for this
proposed function.

Would the best approach be to build in a core srf-returning function that
might be deprecated once COPY is set-returning AND CTEs don't have to
materialize, or to refactor what's in copy.c such that a contrib module can
easily plug into it, and have copy_srf live there?


Re: [HACKERS] COPY as a set returning function

2016-10-16 Thread Craig Ringer
On 15 Oct. 2016 04:56, "Corey Huinker"  wrote:

> I would like to make COPY itself a SRF. That's a bit beyond my
capabilities, so if that is the route we want to go, I will need help.
>
> The syntax would probably look like this (new bits in bold):
>
>> WITH my_copy  AS (
>> COPY FROM 'example.csv' TO RESULT SET(c1 text, c2 integer, dummy1
text, dummy2 text, c5 date) WITH (FORMAT CSV)
>> RETURNING c1, c2, c3
>> )

Strong -1 from me on this approach. Our CTE implementation materializes
everything so this is no better than COPYing to a temp table.

Not unless you plan to fix that (and figure out the backward compatibility
issues since the bug is documented as a feature) or implement RETURNING in
subqueries... I'd go for the function.


Re: [HACKERS] COPY as a set returning function

2016-10-14 Thread Corey Huinker
>
> > That sounds fantastic. It'd help this copy variant retain festure parity
> > with normal copy. And it'd bring us closer to being able to FETCH in non
> > queries.
>
> On second thought, though, this couldn't exactly duplicate the existing
> COPY syntax, because COPY relies heavily on the rowtype of the named
> target table to tell it what it's copying.  You'd need some new syntax
> to provide the list of column names and types, which puts a bit of
> a hole in the "syntax we already know" argument.  A SRF-returning-record
> would have a leg up on that, because we do have existing syntax for
> defining the concrete rowtype that any particular call returns.
>
> regards, tom lane
>


I would like to make COPY itself a SRF. That's a bit beyond my
capabilities, so if that is the route we want to go, I will need help.

The syntax would probably look like this (new bits in bold):

WITH my_copy  AS (
COPY FROM 'example.csv' TO *RESULT SET(c1 text, c2 integer, dummy1
text, dummy2 text, c5 date)* WITH (FORMAT CSV)
*RETURNING c1, c2, c3*
)
SELECT ...
FROM my_copy
LEFT OUTER JOIN ref_table ...


The RESULT SET (colspecs) bit would be the rsinfo currently used by
copy_srf(). It would be nice if the CTE declaration could take types, but
it can't.

The RETURNING clause here doesn't return all the columns made available
from the COPY. That would be nice, but not required because the same
filtration could be done when the CTE is referenced. So if we require RETURNING
* be the only returning option I'd be fine with that.

If we're ok with adding a function like copy_srf() to the core, will we
still be happy with it when COPY does get a RETURNING clause?


Somewhat off-topic: here's some other observations of a n00b who spent a
fair amount of time looking at the copy.c code.

1. BeginCopyFrom and NextCopyFrom pull attlist/tupdesc info from ->rel,
repeatedly. If we were going to try to leverage that code we'd need to
store those things in a separate cstate member so that we add complexity
only in the initialization of the copy state data struct, pulling the
result structure from rsinfo rather than a relation. There's probably a
minor performance gain to be had in keeping that info around. Refactoring
those two procs to allow for a pre-set attlist/tupdesc would help.

2. NextCopyFrom() checks every single time to see if it's row 0 and if it
should skip this header row. I know a single (row_num == 0 && has_header)
isn't much extra processing, but shouldn't we digest and discard headers
before going into the per-row loop?

3. All the code that handles indexes, constraints, buffering, etc, simply
doesn't apply in the SRF context.

4. The code somewhat needlessly mixes code for the COPY FROM and COPY TO
cases. There's probably a good reason for this, but it made for a lot of
clutter in achieving my very narrow goal.


Re: [HACKERS] COPY as a set returning function

2016-09-30 Thread Tom Lane
Craig Ringer  writes:
> On 1 Oct. 2016 05:20, "Tom Lane"  wrote:
>> I think the last of those suggestions has come up before.  It has the
>> large advantage that you don't have to remember a different syntax for
>> copy-as-a-function.

> That sounds fantastic. It'd help this copy variant retain festure parity
> with normal copy. And it'd bring us closer to being able to FETCH in non
> queries.

On second thought, though, this couldn't exactly duplicate the existing
COPY syntax, because COPY relies heavily on the rowtype of the named
target table to tell it what it's copying.  You'd need some new syntax
to provide the list of column names and types, which puts a bit of
a hole in the "syntax we already know" argument.  A SRF-returning-record
would have a leg up on that, because we do have existing syntax for
defining the concrete rowtype that any particular call returns.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY as a set returning function

2016-09-30 Thread Craig Ringer
On 1 Oct. 2016 05:20, "Tom Lane"  wrote:
>
> Corey Huinker  writes:
> > Attached is a _very_ rough patch implementing a proof-of-concept
function
> > copy_srf();
> > ...
> > As for that future direction, we could either have:
> > - a robust function named something like copy_srf(), with parameters for
> > all of the relevant options found in the COPY command
> > - a function that accepts an options string and parse that
> > - we could alter the grammar to make COPY RETURNING col1, col3, col5
FROM
> > 'filename' a legit CTE.
>
> I think the last of those suggestions has come up before.  It has the
> large advantage that you don't have to remember a different syntax for
> copy-as-a-function.  Once you had the framework for that, other
> rows-returning utility commands such as EXPLAIN might plug in as well,
> whenever somebody got enough of an itch for it.

That sounds fantastic. It'd help this copy variant retain festure parity
with normal copy. And it'd bring us closer to being able to FETCH in non
queries.

> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY as a set returning function

2016-09-30 Thread Tom Lane
Corey Huinker  writes:
> Attached is a _very_ rough patch implementing a proof-of-concept function
> copy_srf();
> ...
> As for that future direction, we could either have:
> - a robust function named something like copy_srf(), with parameters for
> all of the relevant options found in the COPY command
> - a function that accepts an options string and parse that
> - we could alter the grammar to make COPY RETURNING col1, col3, col5 FROM
> 'filename' a legit CTE.

I think the last of those suggestions has come up before.  It has the
large advantage that you don't have to remember a different syntax for
copy-as-a-function.  Once you had the framework for that, other
rows-returning utility commands such as EXPLAIN might plug in as well,
whenever somebody got enough of an itch for it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] COPY as a set returning function

2016-09-30 Thread Corey Huinker
Attached is a _very_ rough patch implementing a proof-of-concept function
copy_srf();

It allows you to do things like this:

# select a,c,e from copy_srf('echo 12,345,67,89,2016-01-01',true) as t(a
integer, b text, c text, d text, e date);
 a  | c  | e
++
 12 | 67 | 2016-01-01
(1 row)



Uses for this include:
- avoiding the pattern of creating a temp table just to select all the rows
back out and then discard the table (avoidable disk activity, avoidable oid
churn)
- avoiding the pattern of creating a file_fdw table in pg_temp just to drop
it after one select (avoidable oid churn)
- filtering file/program input by the columns that are relevant to the
user's needs.

This experiment arose from my realization that file_fdw just plugs into the
externally visible portions of copy.c to do all of it's work. So why
couldn't we do the same for a set returning function? Of course it wasn't
as simple as that. The existing Begin/NextCopyFrom functions require the
->rel to be a valid Oid...which we won't have in this context, so I had to
bypass that code and use CopyFromRawFields() directly...which isn't
externally visible, hence this being a patch to core rather than an
extension.

Currently the function only accepts two parameters, "filename" and
"is_program". Header is always false and csv mode is always true. Obviously
if we go forward on this, we'll want to add that functionality back in, but
I'm holding off for now to keep the example simple and wait for consensus
on future direction.

As for that future direction, we could either have:
- a robust function named something like copy_srf(), with parameters for
all of the relevant options found in the COPY command
- a function that accepts an options string and parse that
- we could alter the grammar to make COPY RETURNING col1, col3, col5 FROM
'filename' a legit CTE.

Regardless of the path forward, I'm going to need help in getting there,
hence this email. Thank you for your consideration.


copy_as_a_srf.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers