subject:"\[HACKERS\] raw output from copy"

Re: [HACKERS] raw output from copy

2016-12-06 Thread Kohei KaiGai

2016-12-06 16:59 GMT+09:00 Pavel Stehule :
>
>
> 2016-12-06 1:50 GMT+01:00 Kohei KaiGai :
>>
>> 2016-12-05 22:45 GMT+09:00 Pavel Stehule :
>> >
>> > There are more goals:
>> >
>> > 1. user friendly import of text or binary data - import text data (with
>> > psql) from file is possible - but you have to load a content to psql
>> > variable. For binary data you should to use workaround based on LO and
>> > transformation from LO to bytea.
>> >
>> > 2. user friendly export text or binary data - now, the binary data can
>> > be
>> > exported only via transformation to LO. The XML has very interesting
>> > features when is passing from/to client binary. This feature is
>> > impossible
>> > in psql now.
>> >
>>   :
>> 
>>   :
>> >> It seems to me extend of COPY statement for this optimization is a bit
>> >> overkill
>> >> solution. Do we find out an alternative solution that we can build on
>> >> the existing
>> >> infrastructure?
>> >
>> > The advantage and sense of COPY RAW was reusing existing interface. The
>> > question was: How I can export/import binary data simply from psql
>> > console?
>> >
>> OK, I could get your point.
>>
>> Likeky, we can implement the feature without COPY statement enhancement
>> by adding a special purpose function and \xxx command on psql.
>>
>> Let's assume the two commands below on psql:
>>
>> \blob_import   (STDIN|)
>> \blob_export  (STDOUT|)
>>
>> On \blob_import, the psql command reads the binary contents from either
>> stdin or file, than call a special purpose function that takes three
>> arguments; table name, column name and a binary data chunk.
>> PQexecParams() of libpq allows to deliver the data chunk with keeping
>> binary data format, then the special purpose function will be able to
>> lookup the destination table/column and construct a tuple that contains
>> the supplied data chunk. (I think _recv handler shall be used for
>> data validation, but not an element of this feature.)
>>
>>
>> On \blob_export, the psql command also set up a simple query as follows:
>>   SELECT blob_export((> For example,
>>   \blob_export SELECT binary_data FROM my_table WHERE id = 10   /tmp/aaa
>> shall be transformed to
>>   SELECT blob_export((SELECT binary_data FROM my_table WHERE id = 10))
>
>
> This is reason why I prefer a COPY statement - because it does all necessary
> things natural.  But if there is a disagreement against COPY RAW it can be
> implemented as psql commands.
>
Yes, both of approach will be able to implement what you want to do.
I agree it is valuable if psql can import/export a particular item with
simple shell-script description, however, here is no consensus how
to implement it.

If psql supports the special \xxx command, it is equivalently convenient
from the standpoint of users, with no enhancement of the statement.

I hope committers comment on the approach we will take on.

Thanks,

> export should be similar like \g, \gset feature
>
> so
>
> SELECT xmldoc FROM 
> \gbinary_store .xxx
>
> import is maybe better solved by proposed file references in queries
>
> Regards
>
> Pavel
>
>
>>
>>
>> This function is declared as:
>>   blob_export(anyelement) RETURNS bytea
>> So, as long as the user supplied query returns exactly one column and
>> one row, it can transform the argument to the binary stream, then psql
>> command receive it and dump somewhere; stdout or file.
>>
>> How about your thought?
>>
>> Thanks,
>> --
>> KaiGai Kohei 
>
>



-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-12-06 Thread Pavel Stehule

2016-12-06 1:50 GMT+01:00 Kohei KaiGai :

> 2016-12-05 22:45 GMT+09:00 Pavel Stehule :
> >
> > There are more goals:
> >
> > 1. user friendly import of text or binary data - import text data (with
> > psql) from file is possible - but you have to load a content to psql
> > variable. For binary data you should to use workaround based on LO and
> > transformation from LO to bytea.
> >
> > 2. user friendly export text or binary data - now, the binary data can be
> > exported only via transformation to LO. The XML has very interesting
> > features when is passing from/to client binary. This feature is
> impossible
> > in psql now.
> >
>   :
> 
>   :
> >> It seems to me extend of COPY statement for this optimization is a bit
> >> overkill
> >> solution. Do we find out an alternative solution that we can build on
> >> the existing
> >> infrastructure?
> >
> > The advantage and sense of COPY RAW was reusing existing interface. The
> > question was: How I can export/import binary data simply from psql
> console?
> >
> OK, I could get your point.
>
> Likeky, we can implement the feature without COPY statement enhancement
> by adding a special purpose function and \xxx command on psql.
>
> Let's assume the two commands below on psql:
>
> \blob_import   (STDIN|)
> \blob_export  (STDOUT|)
>
> On \blob_import, the psql command reads the binary contents from either
> stdin or file, than call a special purpose function that takes three
> arguments; table name, column name and a binary data chunk.
> PQexecParams() of libpq allows to deliver the data chunk with keeping
> binary data format, then the special purpose function will be able to
> lookup the destination table/column and construct a tuple that contains
> the supplied data chunk. (I think _recv handler shall be used for
> data validation, but not an element of this feature.)
>
>
> On \blob_export, the psql command also set up a simple query as follows:
>   SELECT blob_export(( For example,
>   \blob_export SELECT binary_data FROM my_table WHERE id = 10   /tmp/aaa
> shall be transformed to
>   SELECT blob_export((SELECT binary_data FROM my_table WHERE id = 10))
>

This is reason why I prefer a COPY statement - because it does all
necessary things natural.  But if there is a disagreement against COPY RAW
it can be implemented as psql commands.

export should be similar like \g, \gset feature

so

SELECT xmldoc FROM 
\gbinary_store .xxx

import is maybe better solved by proposed file references in queries

Regards

Pavel



>
> This function is declared as:
>   blob_export(anyelement) RETURNS bytea
> So, as long as the user supplied query returns exactly one column and
> one row, it can transform the argument to the binary stream, then psql
> command receive it and dump somewhere; stdout or file.
>
> How about your thought?
>
> Thanks,
> --
> KaiGai Kohei 
>

Re: [HACKERS] raw output from copy

2016-12-05 Thread Kohei KaiGai

2016-12-05 22:45 GMT+09:00 Pavel Stehule :
>
> There are more goals:
>
> 1. user friendly import of text or binary data - import text data (with
> psql) from file is possible - but you have to load a content to psql
> variable. For binary data you should to use workaround based on LO and
> transformation from LO to bytea.
>
> 2. user friendly export text or binary data - now, the binary data can be
> exported only via transformation to LO. The XML has very interesting
> features when is passing from/to client binary. This feature is impossible
> in psql now.
>
  :

  :
>> It seems to me extend of COPY statement for this optimization is a bit
>> overkill
>> solution. Do we find out an alternative solution that we can build on
>> the existing
>> infrastructure?
>
> The advantage and sense of COPY RAW was reusing existing interface. The
> question was: How I can export/import binary data simply from psql console?
>
OK, I could get your point.

Likeky, we can implement the feature without COPY statement enhancement
by adding a special purpose function and \xxx command on psql.

Let's assume the two commands below on psql:

\blob_import   (STDIN|)
\blob_export  (STDOUT|)

On \blob_import, the psql command reads the binary contents from either
stdin or file, than call a special purpose function that takes three
arguments; table name, column name and a binary data chunk.
PQexecParams() of libpq allows to deliver the data chunk with keeping
binary data format, then the special purpose function will be able to
lookup the destination table/column and construct a tuple that contains
the supplied data chunk. (I think _recv handler shall be used for
data validation, but not an element of this feature.)


On \blob_export, the psql command also set up a simple query as follows:
  SELECT blob_export((


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-12-05 Thread Pavel Stehule

Hi

2016-12-05 14:19 GMT+01:00 Kohei KaiGai :

> Sorry for my late response.
>
> I've briefly checked a series of discussion in the past.
> I understood the target/purpose of this patch is provision of a fast
> interface
> to import/export a particular cell of a relation, by skip of text<->binary
> transformation. Its typical use case are XML and JSON data types. Right?
>

There are more goals:

1. user friendly import of text or binary data - import text data (with
psql) from file is possible - but you have to load a content to psql
variable. For binary data you should to use workaround based on LO and
transformation from LO to bytea.

2. user friendly export text or binary data - now, the binary data can be
exported only via transformation to LO. The XML has very interesting
features when is passing from/to client binary. This feature is impossible
in psql now.

>
> If so, how about the idea to use fast-path invocation protocol to call
> functions
> to import/export these document types?
> It allows to accept binary form of the data stream, with minimum overheads.
>

Sorry, I don't see a sense - for Fast API I have to write a application -
and then I can write a application with support of binary passing.

>
> It seems to me extend of COPY statement for this optimization is a bit
> overkill
> solution. Do we find out an alternative solution that we can build on
> the existing
> infrastructure?
>

The advantage and sense of COPY RAW was reusing existing interface. The
question was: How I can export/import binary data simply from psql console?

Regards

Pavel

>
> Best regards,
>
> 2016-12-05 14:16 GMT+09:00 Haribabu Kommi :
> >
> >
> > On Tue, Nov 22, 2016 at 10:48 PM, Haribabu Kommi <
> kommi.harib...@gmail.com>
> > wrote:
> >>
> >>  Hi,
> >>
> >> This is a gentle reminder.
> >>
> >> you assigned as reviewer to the current patch in the 11-2016 commitfest.
> >> But you haven't shared your review yet. Please share your review about
> >> the patch. This will help us in smoother operation of commitfest.
> >>
> >> Please Ignore if you already shared your review.
> >
> >
> > Patch is not applying properly to HEAD.
> > Moved to next CF with "waiting on author" status.
> >
> >
> > Regards,
> > Hari Babu
> > Fujitsu Australia
>
>
>
> --
> KaiGai Kohei 
>

Re: [HACKERS] raw output from copy

2016-12-05 Thread Kohei KaiGai

Sorry for my late response.

I've briefly checked a series of discussion in the past.
I understood the target/purpose of this patch is provision of a fast interface
to import/export a particular cell of a relation, by skip of text<->binary
transformation. Its typical use case are XML and JSON data types. Right?

If so, how about the idea to use fast-path invocation protocol to call functions
to import/export these document types?
It allows to accept binary form of the data stream, with minimum overheads.

It seems to me extend of COPY statement for this optimization is a bit overkill
solution. Do we find out an alternative solution that we can build on
the existing
infrastructure?

Best regards,

2016-12-05 14:16 GMT+09:00 Haribabu Kommi :
>
>
> On Tue, Nov 22, 2016 at 10:48 PM, Haribabu Kommi 
> wrote:
>>
>>  Hi,
>>
>> This is a gentle reminder.
>>
>> you assigned as reviewer to the current patch in the 11-2016 commitfest.
>> But you haven't shared your review yet. Please share your review about
>> the patch. This will help us in smoother operation of commitfest.
>>
>> Please Ignore if you already shared your review.
>
>
> Patch is not applying properly to HEAD.
> Moved to next CF with "waiting on author" status.
>
>
> Regards,
> Hari Babu
> Fujitsu Australia

-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-12-04 Thread Haribabu Kommi

On Tue, Nov 22, 2016 at 10:48 PM, Haribabu Kommi 
wrote:

>  Hi,
>
> This is a gentle reminder.
>
> you assigned as reviewer to the current patch in the 11-2016 commitfest.
> But you haven't shared your review yet. Please share your review about
> the patch. This will help us in smoother operation of commitfest.
>
> Please Ignore if you already shared your review.
>

Patch is not applying properly to HEAD.
Moved to next CF with "waiting on author" status.


Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] raw output from copy

2016-11-22 Thread Haribabu Kommi

 Hi,

This is a gentle reminder.

you assigned as reviewer to the current patch in the 11-2016 commitfest.
But you haven't shared your review yet. Please share your review about
the patch. This will help us in smoother operation of commitfest.

Please Ignore if you already shared your review.


Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] raw output from copy

2016-10-02 Thread Michael Paquier

On Sat, Jul 16, 2016 at 5:55 PM, Pavel Stehule  wrote:
> I am sending fresh version of COPY RAW patch.

Moved to next CF per this status.

+++ b/src/interfaces/libpq/test/copy-raw-regress.pl
@@ -0,0 +1,48 @@
+#!/usr/bin/perl -w
+
+use strict;
I don't understand why this is shaped this way, I mean the perl part
if we have the TAP infra in place. MSVC is not testing it as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-07-16 Thread Pavel Stehule

Hi

2016-04-05 10:45 GMT+02:00 Pavel Stehule :

> Hi
>
> here is cleaned/finished previous implementation of RAW_TEXT/RAW_BINARY
> formats for COPY statements.
>
> The RAW with text formats means unescaped data, but with correct encoding
> - input/output is realised with input/output function. RAW binary means
> content produced/received by sending/received functions.
>
> Now both directions (input/output) working well
>
> Some examples of expected usage:
>
> copy (select xmlelement(name foo, 'hello')) to stdout (format raw_binary,
> encoding 'latin2');
>
> create table avatars(id serial, picture bytea);
> \copy avatars(picture) from ~/images/foo.jpg (format raw_binary);
> select lastval();
>
> create table doc(id serial, txt text);
> \copy doc(txt) from ~/files/aaa.txt (format raw_text, encoding 'latin2');
> select lastval();
>
> Regards
>
> Pavel
>
>
I am sending fresh version of COPY RAW patch.

There is new regress client test requested by Tom.

Note: I though about another solution based on binary parameters and binary
result support in psql. Somelike:

INSERT INTO foo(a) VALUES($1)
\gpush filename

SELECT a FROM foo
\gpop filename

but, it is less intuitive, and doesn't work with stdin/stdout - so it is
significant week against COPY based solution for scripting from shell. More
\g* solution is still possible if will be requested in future.

Regards

Pavel

[pavel@nemesis ~]$ cat avatar.gif | psql -Xq -At -c "copy xx(b) from stdin
(format raw_text)" -c "select lastval()" postgres
313
commit d62c1ff8dee2324ce1fe7765c2d015e68f5f923a
Author: Pavel Stehule 
Date:   Sat Jul 16 10:35:25 2016 +0200

with regress tests

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 6285dd0..4c6cacb 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -3226,8 +3226,9 @@ int PQfformat(const PGresult *res,
 
   
Format code zero indicates textual data representation, while format
-   code one indicates binary representation.  (Other codes are reserved
-   for future definition.)
+   code one indicates binary representation. Format code two indicates
+   raw_text representation and format code three indicates raw_binary
+   representation (Other codes are reserved for future definition.)
   
  
 
@@ -3557,6 +3558,26 @@ typedef struct

 

+
+ 
+  PQcopyFormat
+  
+   PQcopyFormat
+  
+ 
+
+ 
+  
+   Format code zero indicates textual data representation, format one
+   indicates binary representation, format two indicates raw
+   representation.
+
+int PQcopyFormat(PGresult *res);
+
+  
+ 
+
+
 
  
   PQcmdStatus
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 9c96d8f..adcff46 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -3239,6 +3239,7 @@ CopyInResponse (B)
 characters, etc).
 1 indicates the overall copy format is binary (similar
 to DataRow format).
+2 indicates the overall copy format is raw.
 See 
 for more information.
 
@@ -3262,8 +3263,9 @@ CopyInResponse (B)
 
 
 The format codes to be used for each column.
-Each must presently be zero (text) or one (binary).
-All must be zero if the overall copy format is textual.
+Each must be zero (text), one (binary), two (raw_text)
+or three (raw_binary). All must be zero if the overall
+copy format is textual.
 
 
 
@@ -3313,7 +3315,8 @@ CopyOutResponse (B)
 is textual (rows separated by newlines, columns
 separated by separator characters, etc). 1 indicates
 the overall copy format is binary (similar to DataRow
-format). See  for more information.
+format). 2 indicates raw_text or raw_binary format.
+See  for more information.
 
 
 
@@ -3335,8 +3338,9 @@ CopyOutResponse (B)
 
 
 The format codes to be used for each column.
-Each must presently be zero (text) or one (binary).
-All must be zero if the overall copy format is textual.
+Each must be zero (text), one (binary), two (raw_text)
+or three (raw_binary). All must be zero if the overall
+copy format is textual.
 
 
 
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 07e2f45..4e339e4 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -197,7 +197,9 @@ COPY { table_name [ ( text,
   csv (Comma Separated Values),
-  or binary.
+  binary,
+  raw_text
+  or raw_binary.
   The default is text.
  
 
@@ -888,6 +890,44 @@ OIDs to be shown as null if that ever proves desirable.
 

   
+
+  
+

Re: [HACKERS] raw output from copy

2016-04-12 Thread Pavel Stehule

2016-04-12 22:48 GMT+02:00 Tom Lane :

> Pavel Stehule  writes:
> > I had a idea about additional options of COPY RAW statements. One can be
> > CAST function. These CAST functions can be used to any for any format.
>
> Uh, what?  CAST() is not about external representations of values, and
> overloading it for that purpose doesn't seem like a particularly good
> idea: you'd have to figure out what the conversions meant inside SQL as
> well as externally.  Also, maybe I missed something, but a different
> representation of individual data values within a COPY wasn't what we
> were after here.
>

I didn't think about this idea to deep - so there can be more than one
problem. More - I though about it before you designed RAW_TEXT mode - that
can coverage this use case too.

Originally I had only RAW mode, what can be difficult for JSONB, so my
solution was

COPY target(jsonbcol) FROM jsondata OPTIONS(RAW, CAST(json_to_jsonb)).

Now this idea is obsolete, because anybody can do

COPY target(jsonbcol) FROM jsondata OPTIONS(RAW_TEXT)

What is much more simple.

Using explicit casts in COPY statement was motivated by possible
requirement do some manipulations with data before their storing to table.
It is idea, and probably wrong idea.

I don't want to increase complexity of COPY statement too much. My goal is
enhance COPY to import single objects simply. And if you need some more
complex, then you can write some simple application where can be used
classic COPY or COPY RAW again (because it doesn't require escaping).

Regards

Pavel

>
> regards, tom lane
>

Re: [HACKERS] raw output from copy

2016-04-12 Thread Tom Lane

Pavel Stehule  writes:
> I had a idea about additional options of COPY RAW statements. One can be
> CAST function. These CAST functions can be used to any for any format.

Uh, what?  CAST() is not about external representations of values, and
overloading it for that purpose doesn't seem like a particularly good
idea: you'd have to figure out what the conversions meant inside SQL as
well as externally.  Also, maybe I missed something, but a different
representation of individual data values within a COPY wasn't what we
were after here.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-04-12 Thread Pavel Stehule

2016-04-12 12:22 GMT+02:00 Ants Aasma :

> On 8 Apr 2016 9:14 pm, "Pavel Stehule"  wrote:
> > 2016-04-08 20:54 GMT+02:00 Andrew Dunstan :
> >> I should add that I've been thinking about this some more, and that I
> now agree that something should be done to support this at the SQL level,
> mainly so that clients can manage very large pieces of data in a
> stream-oriented fashion rather than having to marshall the data in memory
> to load/unload via INSERT/SELECT. Anything that is client-side only is
> likely to have this memory issue.
> >>
> >> At the same time I'm still not entirely convinced that COPY is a good
> vehicle for this. It's designed for bulk records, and already quite
> complex. Maybe we need something new that uses the COPY protocol but is
> more specifically tailored for loading or sending large singleton pieces of
> data.
> >
> >
> > Now it is little bit more time to think more about. But It is hard to
> design some more simpler than is COPY syntax. What will support both
> directions.
>
> Sorry for arriving late and adding to the bikeshedding. Maybe the
> answer is to make COPY pluggable. It seems to me that it would be
> relatively straightforward to add an extension mechanism for copy
> output and input plugins that could support any format expressible as
> a binary stream. Raw output would then be an almost trivial plugin.
> Others could implement JSON, protocol buffers, Redis bulk load, BSON,
> ASN.1 or whatever else serialisation format du jour. It will still
> have the same backwards compatibility issues as adding the raw output,
> but the payoff is greater.
>

I had a idea about additional options of COPY RAW statements. One can be
CAST function. These CAST functions can be used to any for any format.

COPY has two parts - client, and server side. Currently we cannot to expand
libpq, and we cannot to expand psql. So we have to send data to client in
target format and all transformations should be done on server side.
Personally, I strongly prefer to write Linux server side extensions against
MSWin client side extensions. The client (psql) is able to use a pipe - so
any client side transformation can be done outer psql.

Regards

Pavel




>
> Regards,
> Ants Aasma
>

Re: [HACKERS] raw output from copy

2016-04-12 Thread Ants Aasma

On 8 Apr 2016 9:14 pm, "Pavel Stehule"  wrote:
> 2016-04-08 20:54 GMT+02:00 Andrew Dunstan :
>> I should add that I've been thinking about this some more, and that I now 
>> agree that something should be done to support this at the SQL level, mainly 
>> so that clients can manage very large pieces of data in a stream-oriented 
>> fashion rather than having to marshall the data in memory to load/unload via 
>> INSERT/SELECT. Anything that is client-side only is likely to have this 
>> memory issue.
>>
>> At the same time I'm still not entirely convinced that COPY is a good 
>> vehicle for this. It's designed for bulk records, and already quite complex. 
>> Maybe we need something new that uses the COPY protocol but is more 
>> specifically tailored for loading or sending large singleton pieces of data.
>
>
> Now it is little bit more time to think more about. But It is hard to design 
> some more simpler than is COPY syntax. What will support both directions.

Sorry for arriving late and adding to the bikeshedding. Maybe the
answer is to make COPY pluggable. It seems to me that it would be
relatively straightforward to add an extension mechanism for copy
output and input plugins that could support any format expressible as
a binary stream. Raw output would then be an almost trivial plugin.
Others could implement JSON, protocol buffers, Redis bulk load, BSON,
ASN.1 or whatever else serialisation format du jour. It will still
have the same backwards compatibility issues as adding the raw output,
but the payoff is greater.

Regards,
Ants Aasma

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-04-08 Thread Pavel Stehule

2016-04-08 20:54 GMT+02:00 Andrew Dunstan :

>
>
> On 04/08/2016 02:13 PM, Robert Haas wrote:
>
>> On Tue, Apr 5, 2016 at 4:45 AM, Pavel Stehule 
>> wrote:
>>
>>> here is cleaned/finished previous implementation of RAW_TEXT/RAW_BINARY
>>> formats for COPY statements.
>>>
>>> The RAW with text formats means unescaped data, but with correct
>>> encoding -
>>> input/output is realised with input/output function. RAW binary means
>>> content produced/received by sending/received functions.
>>>
>>> Now both directions (input/output) working well
>>>
>>> Some examples of expected usage:
>>>
>>> copy (select xmlelement(name foo, 'hello')) to stdout (format raw_binary,
>>> encoding 'latin2');
>>>
>>> create table avatars(id serial, picture bytea);
>>> \copy avatars(picture) from ~/images/foo.jpg (format raw_binary);
>>> select lastval();
>>>
>>> create table doc(id serial, txt text);
>>> \copy doc(txt) from ~/files/aaa.txt (format raw_text, encoding 'latin2');
>>> select lastval();
>>>
>> As much as I know you and some other people would like it to be
>> otherwise, this patch clearly does not have a sufficient degree of
>> consensus to justify committing it to PostgreSQL 9.6.  I'm marking it
>> Returned with Feedback.
>>
>>
>
> I should add that I've been thinking about this some more, and that I now
> agree that something should be done to support this at the SQL level,
> mainly so that clients can manage very large pieces of data in a
> stream-oriented fashion rather than having to marshall the data in memory
> to load/unload via INSERT/SELECT. Anything that is client-side only is
> likely to have this memory issue.
>
> At the same time I'm still not entirely convinced that COPY is a good
> vehicle for this. It's designed for bulk records, and already quite
> complex. Maybe we need something new that uses the COPY protocol but is
> more specifically tailored for loading or sending large singleton pieces of
> data.
>

Now it is little bit more time to think more about. But It is hard to
design some more simpler than is COPY syntax. What will support both
directions.

My implementation has same limit like COPY BINARY - it isn't worse. It
should be good enough for VARLENA types that should not be higher than 1GB.
It is not designed for LOB replacement.

Regards

Pavel


>
> cheers
>
> andrew
>
>
>
>

Re: [HACKERS] raw output from copy

2016-04-08 Thread Andrew Dunstan




On 04/08/2016 02:13 PM, Robert Haas wrote:

On Tue, Apr 5, 2016 at 4:45 AM, Pavel Stehule  wrote:

here is cleaned/finished previous implementation of RAW_TEXT/RAW_BINARY
formats for COPY statements.

The RAW with text formats means unescaped data, but with correct encoding -
input/output is realised with input/output function. RAW binary means
content produced/received by sending/received functions.

Now both directions (input/output) working well

Some examples of expected usage:

copy (select xmlelement(name foo, 'hello')) to stdout (format raw_binary,
encoding 'latin2');

create table avatars(id serial, picture bytea);
\copy avatars(picture) from ~/images/foo.jpg (format raw_binary);
select lastval();

create table doc(id serial, txt text);
\copy doc(txt) from ~/files/aaa.txt (format raw_text, encoding 'latin2');
select lastval();

As much as I know you and some other people would like it to be
otherwise, this patch clearly does not have a sufficient degree of
consensus to justify committing it to PostgreSQL 9.6.  I'm marking it
Returned with Feedback.




I should add that I've been thinking about this some more, and that I 
now agree that something should be done to support this at the SQL 
level, mainly so that clients can manage very large pieces of data in a 
stream-oriented fashion rather than having to marshall the data in 
memory to load/unload via INSERT/SELECT. Anything that is client-side 
only is likely to have this memory issue.


At the same time I'm still not entirely convinced that COPY is a good 
vehicle for this. It's designed for bulk records, and already quite 
complex. Maybe we need something new that uses the COPY protocol but is 
more specifically tailored for loading or sending large singleton pieces 
of data.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-04-08 Thread Pavel Stehule

2016-04-08 20:13 GMT+02:00 Robert Haas :

> On Tue, Apr 5, 2016 at 4:45 AM, Pavel Stehule 
> wrote:
> > here is cleaned/finished previous implementation of RAW_TEXT/RAW_BINARY
> > formats for COPY statements.
> >
> > The RAW with text formats means unescaped data, but with correct
> encoding -
> > input/output is realised with input/output function. RAW binary means
> > content produced/received by sending/received functions.
> >
> > Now both directions (input/output) working well
> >
> > Some examples of expected usage:
> >
> > copy (select xmlelement(name foo, 'hello')) to stdout (format raw_binary,
> > encoding 'latin2');
> >
> > create table avatars(id serial, picture bytea);
> > \copy avatars(picture) from ~/images/foo.jpg (format raw_binary);
> > select lastval();
> >
> > create table doc(id serial, txt text);
> > \copy doc(txt) from ~/files/aaa.txt (format raw_text, encoding 'latin2');
> > select lastval();
>
> As much as I know you and some other people would like it to be
> otherwise, this patch clearly does not have a sufficient degree of
> consensus to justify committing it to PostgreSQL 9.6.  I'm marking it
> Returned with Feedback.
>

ok, I'll try to complete this patch

Regards

Pavel


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] raw output from copy

2016-04-08 Thread Robert Haas

On Tue, Apr 5, 2016 at 4:45 AM, Pavel Stehule  wrote:
> here is cleaned/finished previous implementation of RAW_TEXT/RAW_BINARY
> formats for COPY statements.
>
> The RAW with text formats means unescaped data, but with correct encoding -
> input/output is realised with input/output function. RAW binary means
> content produced/received by sending/received functions.
>
> Now both directions (input/output) working well
>
> Some examples of expected usage:
>
> copy (select xmlelement(name foo, 'hello')) to stdout (format raw_binary,
> encoding 'latin2');
>
> create table avatars(id serial, picture bytea);
> \copy avatars(picture) from ~/images/foo.jpg (format raw_binary);
> select lastval();
>
> create table doc(id serial, txt text);
> \copy doc(txt) from ~/files/aaa.txt (format raw_text, encoding 'latin2');
> select lastval();

As much as I know you and some other people would like it to be
otherwise, this patch clearly does not have a sufficient degree of
consensus to justify committing it to PostgreSQL 9.6.  I'm marking it
Returned with Feedback.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-04-05 Thread Pavel Stehule

Hi

here is cleaned/finished previous implementation of RAW_TEXT/RAW_BINARY
formats for COPY statements.

The RAW with text formats means unescaped data, but with correct encoding -
input/output is realised with input/output function. RAW binary means
content produced/received by sending/received functions.

Now both directions (input/output) working well

Some examples of expected usage:

copy (select xmlelement(name foo, 'hello')) to stdout (format raw_binary,
encoding 'latin2');

create table avatars(id serial, picture bytea);
\copy avatars(picture) from ~/images/foo.jpg (format raw_binary);
select lastval();

create table doc(id serial, txt text);
\copy doc(txt) from ~/files/aaa.txt (format raw_text, encoding 'latin2');
select lastval();

Regards

Pavel
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
new file mode 100644
index 3829a14..7b9ed73
*** a/doc/src/sgml/libpq.sgml
--- b/doc/src/sgml/libpq.sgml
*** int PQfformat(const PGresult *res,
*** 3226,3233 
  

 Format code zero indicates textual data representation, while format
!code one indicates binary representation.  (Other codes are reserved
!for future definition.)

   
  
--- 3226,3234 
  

 Format code zero indicates textual data representation, while format
!code one indicates binary representation. Format code two indicates
!raw_text representation and format code three indicates raw_binary
!representation (Other codes are reserved for future definition.)

   
  
*** typedef struct
*** 3557,3562 
--- 3558,3583 
 
  
 
+ 
+  
+   PQcopyFormat
+   
+PQcopyFormat
+   
+  
+ 
+  
+   
+Format code zero indicates textual data representation, format one
+indicates binary representation, format two indicates raw
+representation.
+ 
+ int PQcopyFormat(PGresult *res);
+ 
+   
+  
+ 
+ 
  
   
PQcmdStatus
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
new file mode 100644
index 522128e..e783b30
*** a/doc/src/sgml/protocol.sgml
--- b/doc/src/sgml/protocol.sgml
*** CopyInResponse (B)
*** 3239,3244 
--- 3239,3245 
  characters, etc).
  1 indicates the overall copy format is binary (similar
  to DataRow format).
+ 2 indicates the overall copy format is raw.
  See 
  for more information.
  
*** CopyInResponse (B)
*** 3262,3269 
  
  
  The format codes to be used for each column.
! Each must presently be zero (text) or one (binary).
! All must be zero if the overall copy format is textual.
  
  
  
--- 3263,3271 
  
  
  The format codes to be used for each column.
! Each must be zero (text), one (binary), two (raw_text)
! or three (raw_binary). All must be zero if the overall
! copy format is textual.
  
  
  
*** CopyOutResponse (B)
*** 3313,3319 
  is textual (rows separated by newlines, columns
  separated by separator characters, etc). 1 indicates
  the overall copy format is binary (similar to DataRow
! format). See  for more information.
  
  
  
--- 3315,3322 
  is textual (rows separated by newlines, columns
  separated by separator characters, etc). 1 indicates
  the overall copy format is binary (similar to DataRow
! format). 2 indicates raw_text or raw_binary format.
! See  for more information.
  
  
  
*** CopyOutResponse (B)
*** 3335,3342 
  
  
  The format codes to be used for each column.
! Each must presently be zero (text) or one (binary).
! All must be zero if the overall copy format is textual.
  
  
  
--- 3338,3346 
  
  
  The format codes to be used for each column.
! Each must be zero (text), one (binary), two (raw_text)
! or three (raw_binary). All must be zero if the overall
! copy format is textual.
  
  
  
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
new file mode 100644
index 07e2f45..4e339e4
*** a/doc/src/sgml/ref/copy.sgml
--- b/doc/src/sgml/ref/copy.sgml
*** COPY { ta
*** 197,203 
Selects the data format to be read or written:
text,
csv (Comma Separated Values),
!   or binary.
The default is text.
   
  
--- 197,205 
Selects the data format to be read or written:
text,
csv (Comma Separated Values),
!   binary,
!   raw_text
!   or raw_binary.
The default is text.

Re: [HACKERS] raw output from copy

2016-04-04 Thread Daniel Verite

Tom Lane wrote:

> >> Code that uses PQexecParams() binary "resultFormat", or the
> >> binary format of copy doesn't have that problem,  but most
> >> client-side drivers don't do that.
> 
> > And maybe they just can't realistically, because  getting result
> > format in binary is exposed as an all-or-nothing choice in libpq.
> 
> That's simply wrong.  Read the documentation for PQexecParams and
> friends: you can specify text or binary per-column.  It's COPY that
> has the only-one-column-format restriction, and RAW certainly isn't
> going to make that better.

About PQexecParams, I disagree, the parameters formats can be
specified independantly, but the not the results, which are either all
binary or all text.

Quoting the doc at
http://www.postgresql.org/docs/9.5/static/libpq-exec.html

PGresult *PQexecParams(PGconn *conn,
   const char *command,
   int nParams,
   const Oid *paramTypes,
   const char * const *paramValues,
   const int *paramLengths,
   const int *paramFormats,
   int resultFormat);
[...]

resultFormat:
Specify zero to obtain results in text format, or one to obtain results
in binary format. (There is not currently a provision to obtain different
result columns in different formats, although that is possible in the
underlying protocol.)

For the client-side drivers that I've looked at, like these used in php
or perl, they just never use resultFormat=1.
I assume that they consider that having all values
in binary is unworkable for them, which is reasonable.
Maybe if they had a per-column choice, they wouldn't
use it anyway, but at least it would be theirs to decide

All this is only tangentially related to COPY RAW.
It's just that COPY RAW can be seen as an efficient alternative to
the single-column returning [SELECT bytea_column FROM...]
The drivers currently request this in text mode even though
it makes no sense in this particular case, and it gets measurably
annoying if the contents are big.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-04-04 Thread David G. Johnston

On Fri, Apr 1, 2016 at 8:42 AM, Daniel Verite 
wrote:

> Andrew Dunstan wrote:
>
> > If someone can make a good case that this is going to be of
> > general use I'll happily go along, but I haven't seen one so far.
>
> About COPY FROM with a raw format, for instance just yesterday
> there was this user question on stackoverflow:
> http://stackoverflow.com/questions/36317237
>
> which essentially is: how to import contents from a file without any
> particular interpretation of any character?
>
> With the patch discussed in this thread, a user can do
> \copy table(textcol) from /path/to/file (format raw)
>

What is needed to solve this specific use-case is a way to specify "QUOTE
NONE" instead of the default for whatever format is being hijacked:

COPY file_content FROM '/tmp/textfile.txt' WITH (FORMAT csv, QUOTE
E'');

becomes

COPY file_content FROM '/tmp/textfile.txt' WITH (FORMAT csv, QUOTE NONE);

Or maybe: "WITH (FORMAT single_column)"

Though maybe that doesn't extend well to unencoded binary data...which
seems like it can be considered a separate problem from reliably importing
an entire file into a single row+column in a table.

David J.

Re: [HACKERS] raw output from copy

2016-04-04 Thread Tom Lane

"Daniel Verite"  writes:
> One reason of adding the format to COPY is that it's where users
> are looking for it. It's the canonical way of importing contents
> from files so that's where it makes more sense.

I'm not sure I buy that argument, because it could be used to justify
adding absolutely any ETL functionality to COPY.  And we don't want
to go down that path; the design intention for COPY is that it be as
simple and fast as possible.

>> And I am still waiting for a non-psql use case. But I don't expect to 
>> see one, precisely because most clients have no difficulty at all in 
>> handling binary data.

> You mean small or medium-size binary data. The 512MB-1GB range is
> impossible to handle if requested in text format, which is what drivers
> tend to use. Even pg_dump fails on these contents.

... which is COPY.  I do not see that RAW mode is going to help much
here: it's not going to be noticeably better than COPY BINARY in terms
of maximum field width.

>> Code that uses PQexecParams() binary "resultFormat", or the
>> binary format of copy doesn't have that problem,  but most
>> client-side drivers don't do that.

> And maybe they just can't realistically, because  getting result
> format in binary is exposed as an all-or-nothing choice in libpq.

That's simply wrong.  Read the documentation for PQexecParams and
friends: you can specify text or binary per-column.  It's COPY that
has the only-one-column-format restriction, and RAW certainly isn't
going to make that better.

I'm not quite as convinced as Andrew that RAW mode is unnecessary,
but I don't find these arguments for it to be very compelling.

The real issue to my mind is that it doesn't seem like we can shoehorn
a sanely-defined version of RAW into the existing protocol spec without
creating compatibility hazards.  So we can either wait for the mythical
protocol v4 (but even a protocol update wouldn't fix the application-level
hazards) or we can treat it as a problem to be solved client-side.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-04-04 Thread Daniel Verite

Andrew Dunstan wrote:

> Inserting the whole contents of a text file unchanged is insanely easy 
> in psql.
> 
>  \set file `cat /path/to/file`
>  insert into mytable(contents) values(:'file');

That's assuming psql but the asker of that question never mentioned
using psql. The COPY invocation could be inside a function. Even if
that particular user would be fine with a psql-only option, the next
one might not. Or they might want to import a binary file, and
as you mention, currently there's no equivalent of the :'var'
feature for binary.

But there's another aspect to this that's worth of consideration,
and that this forum question illustrates.
One reason of adding the format to COPY is that it's where users
are looking for it. It's the canonical way of importing contents
from files so that's where it makes more sense.
From the POV of being user friendly and consistent, restricting what
COPY can do because psql could do it completely differently
if the user was psql-savvy enough to know it, what sense does it
make?

> And I am still waiting for a non-psql use case. But I don't expect to 
> see one, precisely because most clients have no difficulty at all in 
> handling binary data.

You mean small or medium-size binary data. The 512MB-1GB range is
impossible to handle if requested in text format, which is what drivers
tend to use. Even pg_dump fails on these contents.
Maybe it was unimportant when bytea was added ~15 years ago,
but the size of data that people actually put into bytea columns is
growing, following Moore's law like the rest.

Even in the lower size range, considering the amount of memory allocated
and the time spent to convert to hex, sending twice the number
of bytes on the wire, just to do the reverse conversion in the client
as soon as all data is obtained, it works but it's pointless
and inefficient.

Code that uses PQexecParams() binary "resultFormat", or the
binary format of copy doesn't have that problem,  but most
client-side drivers don't do that.

And maybe they just can't realistically, because  getting result
format in binary is exposed as an all-or-nothing choice in libpq.

I mean if client code does SELECT * FROM table or even COPY of the
same, and what comes back is bytea and e.g. timestamps and floats and
custom types, the client-side driver may wish to have the bytea field in
binary format for efficiency and the rest in text format for
usability, but that's not possible with PQexecParams(), or other
libpq functions.

The point of mixing binary and text is outside the scope of a RAW
format for COPY, as obviously it wouldn't help with that in any way,
but on the argument that the status quo is fine because clients
have no difficulty, that's just not true. Clients cope with what they have,
but what they have is far from being complete or optimal.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-04-02 Thread Andrew Dunstan




On 04/01/2016 11:42 AM, Daniel Verite wrote:

Andrew Dunstan wrote:


If someone can make a good case that this is going to be of
general use I'll happily go along, but I haven't seen one so far.

About COPY FROM with a raw format, for instance just yesterday
there was this user question on stackoverflow:
http://stackoverflow.com/questions/36317237

which essentially is: how to import contents from a file without any
particular interpretation of any character?\



There is so much wrong with this it's hard to know where to start.

Inserting the whole contents of a text file unchanged is insanely easy 
in psql.


\set file `cat /path/to/file`
insert into mytable(contents) values(:'file');

What is more everyone on SO missed the fact that CSV mode gives you very 
considerable control over the quote, delimiter and null settings.


See for example 
 which 
has this example for handling files consisting of 1 json document per line:


copy the_table(jsonfield)
from '/path/to/jsondata'
csv quote e'\x01' delimiter e'\x02';

psql's \copy will work just the same way

(I noticed with amusement this week that CitusData is using pretty much 
exactly this in one of their examples.)




With the patch discussed in this thread, a user can do
\copy table(textcol) from /path/to/file (format raw)
or the equivalent COPY.
If it's a binary column, that works just the same.



It would be fairly simple to invent a binary mechanism that did the 
equivalent of the above insert. All without any change to SQL or the 
backend at all.





Without this, it's not obvious at all how this result can be
achieved without resorting to external preprocessing,
and assuming the availability of such preprocessing tools
in the environment. Notwithstanding the fact that the
solution proposed on SO (doubling backslashes with sed)
doesn't even work if the file contains tabs, as they would be
interpreted as field separators, even if the copy target has only
one column. You can change the delimiter with COPY but AFAIK
you can't tell that there is none.



There is arguably a good case for allowing a null delimiter. But that SO 
page is just a terrible piece of misinformation, as far too often 
happens in my experience.


And I am still waiting for a non-psql use case. But I don't expect to 
see one, precisely because most clients have no difficulty at all in 
handling binary data.


cheers

andrew








--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-04-01 Thread Daniel Verite

Andrew Dunstan wrote:

> If someone can make a good case that this is going to be of
> general use I'll happily go along, but I haven't seen one so far.

About COPY FROM with a raw format, for instance just yesterday
there was this user question on stackoverflow:
http://stackoverflow.com/questions/36317237

which essentially is: how to import contents from a file without any
particular interpretation of any character?

With the patch discussed in this thread, a user can do
\copy table(textcol) from /path/to/file (format raw)
or the equivalent COPY.
If it's a binary column, that works just the same.

Without this, it's not obvious at all how this result can be
achieved without resorting to external preprocessing,
and assuming the availability of such preprocessing tools
in the environment. Notwithstanding the fact that the
solution proposed on SO (doubling backslashes with sed)
doesn't even work if the file contains tabs, as they would be
interpreted as field separators, even if the copy target has only
one column. You can change the delimiter with COPY but AFAIK
you can't tell that there is none.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-03-31 Thread Pavel Stehule

2016-03-31 9:48 GMT+02:00 Craig Ringer :

> On 31 March 2016 at 14:40, Pavel Stehule  wrote:
>
>
>> this patch doesn't break any old application. Accepting new feature
>> depends on binary method detection. PQbinaryTuples based clients should
>> to support COPY RAW* without problems, PQfformat() should to report
>> unknown format.
>>
>>
> PgJDBC does not use libpq.
>

so it can be interesting test

Pavel


>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: [HACKERS] raw output from copy

2016-03-31 Thread Craig Ringer

On 31 March 2016 at 14:40, Pavel Stehule  wrote:


> this patch doesn't break any old application. Accepting new feature
> depends on binary method detection. PQbinaryTuples based clients should
> to support COPY RAW* without problems, PQfformat() should to report
> unknown format.
>
>
PgJDBC does not use libpq.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] raw output from copy

2016-03-31 Thread Pavel Stehule

2016-03-31 8:34 GMT+02:00 Craig Ringer :

> On 30 March 2016 at 00:19, Tom Lane  wrote:
>
>> Pavel Stehule  writes:
>> > I tested COPY RAW on old psql clients - and it is working without any
>> > problem - so when the client uses same logic as psql, then it should to
>> > work. Sure, there can be differently implemented clients, but the COPY
>> > client side is usually simple - store stream to output.
>>
>> My point is precisely that I doubt all clients are that stupid about COPY.
>>
>
> PgJDBC definitely isn't.
>
> Any changes really need to be tested against PgJDBC's CopyManager.
>

this patch doesn't break any old application. Accepting new feature depends
on binary method detection. PQbinaryTuples based clients should to support
COPY RAW* without problems, PQfformat() should to report unknown format.

Regards

Pavel


>
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: [HACKERS] raw output from copy

2016-03-31 Thread Craig Ringer

On 30 March 2016 at 00:19, Tom Lane  wrote:

> Pavel Stehule  writes:
> > I tested COPY RAW on old psql clients - and it is working without any
> > problem - so when the client uses same logic as psql, then it should to
> > work. Sure, there can be differently implemented clients, but the COPY
> > client side is usually simple - store stream to output.
>
> My point is precisely that I doubt all clients are that stupid about COPY.
>

PgJDBC definitely isn't.

Any changes really need to be tested against PgJDBC's CopyManager.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] raw output from copy

2016-03-31 Thread Pavel Stehule

Hi

2016-03-29 20:59 GMT+02:00 Tom Lane :

> Pavel Stehule  writes:
> > I am writing few lines as summary:
>
> > 1. invention RAW_TEXT and RAW_BINARY
> > 2. for RAW_BINARY: PQbinaryTuples() returns 1 and PQfformat() returns 1
> > 3.a for RAW_TEXT: PQbinaryTuples() returns 0 and PQfformat() returns 0,
> but
> > the client should to check PQcopyFormat() to not print "\n" on the end
> > 3.b for RAW_TEXT: PQbinaryTuples() returns 1 and PQfformat() returns 1,
> but
> > used output function, not necessary client modification
> > 4. PQcopyFormat() returns 0 for text, 1 for binary, 2 for RAW_TEXT, 3 for
> > RAW_BINARY
> > 5. create tests for ecpg
>
> 3.b certainly seems completely wrong.  PQfformat==1 would imply binary
> data.
>
> I suggest that PQcopyFormat should be understood as defining the format
> of the copy data encapsulation, not the individual fields.  So it would go
> like 0 = traditional text format, 1 = traditional binary format, 2 = raw
> (no encapsulation).  You'd need to also look at PQfformat to distinguish
> raw text from raw binary.  But if we do it as you suggest above, we've
> locked ourselves into only ever having two field format codes, which
> is something the existing design is specifically intended to allow
> expansion in.
>
>
I wrote concept of raw_text, raw_binary modes.

I am trying to implement text data passing like text format - but for
RAW_TEXT it is not practical. Text passing is designed for one line data,
for multiline data enforces escaping, what we don't would for RAW mode. I
have to skip escaping, and the code is not nice.

So I propose different schema - RAW_TEXT uses text values (uses
input/output functions), enforce encoding from/to client codes and for
passing to client mode is used binary mode - then I don't need to read the
content with line by line. PQbinaryTuples() returns 1 for RAW_TEXT and
RAW_BINARY - in these cases data are passed as one binary value. PQfformat
returns 2 for RAW_TEXT and 3 for RAW_BINARY.

Any objections to this design?

Regards

Pavel




> regards, tom lane
>
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
new file mode 100644
index 07e2f45..68fbfd8
*** a/doc/src/sgml/ref/copy.sgml
--- b/doc/src/sgml/ref/copy.sgml
*** COPY { ta
*** 197,203 
Selects the data format to be read or written:
text,
csv (Comma Separated Values),
!   or binary.
The default is text.
   
  
--- 197,203 
Selects the data format to be read or written:
text,
csv (Comma Separated Values),
!   binary or raw.
The default is text.
   
  
*** OIDs to be shown as null if that ever pr
*** 888,893 
--- 888,925 
  
 

+ 
+   
+  Raw Format
+ 
+
+ The raw format option causes all data to be
+ stored/read as binary format rather than as text. It shares format
+ for data with binary format. This format doesn't
+ use any metadata - only row data in network byte order are exported
+ or imported.
+
+ 
+
+ Because this format doesn't support any delimiter, only one value
+ can be exported or imported. NULL values are not allowed.
+
+
+ The raw format can be used for export or import
+ bytea values.
+ 
+ COPY images(data) FROM '/usr1/proj/img/01.jpg' (FORMAT raw);
+ 
+ It can be used successfully for export XML in different encoding
+ or import valid XML document with any supported encoding:
+ 
+
+   
   
  
   
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
new file mode 100644
index 3201476..1de36b6
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*** typedef enum EolType
*** 89,94 
--- 89,99 
   * it's faster to make useless comparisons to trailing bytes than it is to
   * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is TRUE
   * when we have to do it the hard way.
+  *
+  * COPY supports three modes: text, binary, raw_text and raw_binary. The text
+  * format is plain text multiline format with specified delimiter. The binary
+  * format holds metadata (numbers, sizes) and data. The raw format holds data
+  * only and only one non NULL value can be processed.
   */
  typedef struct CopyStateData
  {
*** typedef struct CopyStateData
*** 110,115 
--- 115,121 
  	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
  	bool		is_program;		/* is 'filename' a program to popen? */
  	bool		binary;			/* binary format? */
+ 	bool		raw;			/* required raw binary? */
  	bool		oids;			/* include OIDs? */
  	bool		freeze;			/* freeze rows on loading? */
  	bool		csv_mode;		/* Comma Separated Value format? */
*** typedef struct CopyStateData
*** 199,204 
--- 205,213 
  	char	   *raw_buf;
  	int			raw_buf_index;	/* next byte to process */
  	int			raw_buf_len;	/* total # of

Re: [HACKERS] raw output from copy

2016-03-29 Thread Tom Lane

Andrew Dunstan  writes:
> The I would suggest we try to invent something for psql which does help 
> with it. I just don't see this as an SQL problem.

There's certainly a lot to be said for that approach.  I'm still not
convinced that we can make COPY do this without creating compatibility
issues, regardless of the details; and it doesn't seem like a big
enough problem to be worth taking any risks of that sort.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-03-29 Thread Andrew Dunstan




On 03/28/2016 11:18 PM, Pavel Stehule wrote:




Anyway this is certainly not committable as-is, so I'm setting
it back
to Waiting on Author.  But the fact that both libpq and ecpg
would need
updates makes me question whether we can safely pretend that
this isn't
a protocol break.




In that case I humbly submit that there is a case for reviving the
psql patch I posted that kicked off this whole thing and lets you
export a piece of binary data from psql quite easily. It should
certainly not involve any protocol break.


The psql only solution can work only for output. Doesn't help with input.





The I would suggest we try to invent something for psql which does help 
with it. I just don't see this as an SQL problem. Pretty much any driver 
library will have no difficulty in handling binary input and output. 
It's only psql that has an issue, ISTM, and therefore I believe that's 
where the fix should go. What else is going to use this? As an SQL 
change this seems like a solution in search of a problem. If someone can 
make a good case that this is going to be of general use I'll happily go 
along, but I haven't seen one so far.


cheers

andrdew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-03-29 Thread Pavel Stehule

2016-03-29 20:59 GMT+02:00 Tom Lane :

> Pavel Stehule  writes:
> > I am writing few lines as summary:
>
> > 1. invention RAW_TEXT and RAW_BINARY
> > 2. for RAW_BINARY: PQbinaryTuples() returns 1 and PQfformat() returns 1
> > 3.a for RAW_TEXT: PQbinaryTuples() returns 0 and PQfformat() returns 0,
> but
> > the client should to check PQcopyFormat() to not print "\n" on the end
> > 3.b for RAW_TEXT: PQbinaryTuples() returns 1 and PQfformat() returns 1,
> but
> > used output function, not necessary client modification
> > 4. PQcopyFormat() returns 0 for text, 1 for binary, 2 for RAW_TEXT, 3 for
> > RAW_BINARY
> > 5. create tests for ecpg
>
> 3.b certainly seems completely wrong.  PQfformat==1 would imply binary
> data.
>
> I suggest that PQcopyFormat should be understood as defining the format
> of the copy data encapsulation, not the individual fields.  So it would go
> like 0 = traditional text format, 1 = traditional binary format, 2 = raw
> (no encapsulation).  You'd need to also look at PQfformat to distinguish
> raw text from raw binary.  But if we do it as you suggest above, we've
> locked ourselves into only ever having two field format codes, which
> is something the existing design is specifically intended to allow
> expansion in.
>

I have a less courage than you :). The original design worked with almost
clients without changes on client side. New design has lot of combinations,
that are unknown for old clients. It can be better, because the client
authors will do update faster.

If PQfformat will returns 0 = text, 1 = traditional binary, 2 = raw text, 3
= raw binary - like you propose, then PQcopyFormat is useless. I see all
information just from PQfformat.

Regards

Pavel


>
> regards, tom lane
>

Re: [HACKERS] raw output from copy

2016-03-29 Thread Tom Lane

Pavel Stehule  writes:
> I am writing few lines as summary:

> 1. invention RAW_TEXT and RAW_BINARY
> 2. for RAW_BINARY: PQbinaryTuples() returns 1 and PQfformat() returns 1
> 3.a for RAW_TEXT: PQbinaryTuples() returns 0 and PQfformat() returns 0, but
> the client should to check PQcopyFormat() to not print "\n" on the end
> 3.b for RAW_TEXT: PQbinaryTuples() returns 1 and PQfformat() returns 1, but
> used output function, not necessary client modification
> 4. PQcopyFormat() returns 0 for text, 1 for binary, 2 for RAW_TEXT, 3 for
> RAW_BINARY
> 5. create tests for ecpg

3.b certainly seems completely wrong.  PQfformat==1 would imply binary
data.

I suggest that PQcopyFormat should be understood as defining the format
of the copy data encapsulation, not the individual fields.  So it would go
like 0 = traditional text format, 1 = traditional binary format, 2 = raw
(no encapsulation).  You'd need to also look at PQfformat to distinguish
raw text from raw binary.  But if we do it as you suggest above, we've
locked ourselves into only ever having two field format codes, which
is something the existing design is specifically intended to allow
expansion in.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-03-29 Thread Pavel Stehule

Hi

2016-03-29 18:19 GMT+02:00 Tom Lane :

> Pavel Stehule  writes:
> > I tested COPY RAW on old psql clients - and it is working without any
> > problem - so when the client uses same logic as psql, then it should to
> > work. Sure, there can be differently implemented clients, but the COPY
> > client side is usually simple - store stream to output.
>
> My point is precisely that I doubt all clients are that stupid about COPY.
>
> > Maybe I am blind, but I don't see any new security risks. The risk can be
> > only on client side - and if client is not able work with new value, then
> > it can fails.
>
> Well, the point is that low-level code might get used to process the data
> stream for commands it doesn't have any control over.  Maybe there's no
> realistic security risk there, or maybe there is; I'm not sure.
>
> > I am thinking so PQbinaryTuples should to return 1 (without change), and
> > PQfformat should to return 2.
>
> Well, that seems pretty backwards to me.  The format of the individual
> fields is still what it is under COPY BINARY; you would not use a
> different per-field transformation.  You do need to know about the
> overall format of the copy data stream being different, and defining
> PQbinaryTuples as still returning 1 means there's no clean way to
> understand overall copy format vs. per-field format.
>

> There's a case to be made that we should invent a new function named
> along the lines of PQcopyFormat() rather than overloading PQbinaryTuples()
> some more.  That function is currently deprecated and I'm not very happy
> with un-deprecating it only to use it in a confusing way.
>

I see a introduction of PQcopyFormat() as best idea. So for
PQbinaryTuples() and PQfformat() these new changes are transparent - and
PQcopyFormat can returns info about used method.

> To be more concrete about this: I think it's actually rather broken
> that this patch ties RAW to binary format of the field contents.
> Why would it not be exactly as useful to have delimiter-less COPY
> of textual data, for use when there's just one datum and/or you're
> confident in picking the data apart for yourself?  But as things stand
> it'd be too confusing for an application to try to figure out what's
> happening in such a case.
>
> So I think we should either invent RAW_TEXT and RAW_BINARY formats
> (not just RAW) or make RAW be an orthogonal copy option.  And we need
> to improve libpq's behavior enough so that applications can sanely
> figure out what's happening.
>

I had a use case that required binary mode. Higher granularity has sense.

This opening new question - RAW_TEXT will use text output function. But if
I will pass this value as text value, then a behave of current clients will
be same as usual COPY. So I need to use binary protocol. And then the
behave of PQbinaryTuples() and PQfformat() is the question? Although text
value can be passed in binary mode too (with format [length, data...]).

>
> > I executed all tests in libpq and ecpg without any problems. Can you,
> > please, help me with repeating a ecpg issues?
>
> Of course the ecpg tests pass; you didn't extend them to see what would
> happen if someone tries COPY RAW with ecpg.   Likewise, we have no tests
> exercising a client's use of libpq with more intelligence than psql has
> got.  But that doesn't mean it's acceptable to write this patch with no
> thought for such clients.
>

if we don't change PQbinaryTuples() and PQfformat(), then COPY RAW should
be transparent for any client. Server sending data in binary format - what
is generic.

>
> I am fairly sure that there actually are third-party client libraries
> that have more intelligence about COPY than psql, but I do not remember
> any specifics unfortunately.
>

The COPY RAW should not to break any existing application. This is new
feature - and old application, old client use COPY RAW newer. I see as
important the conformity of used mode (text/binary) and PQbinaryTuples()
and PQfformat().

I am writing few lines as summary:

1. invention RAW_TEXT and RAW_BINARY
2. for RAW_BINARY: PQbinaryTuples() returns 1 and PQfformat() returns 1
3.a for RAW_TEXT: PQbinaryTuples() returns 0 and PQfformat() returns 0, but
the client should to check PQcopyFormat() to not print "\n" on the end
3.b for RAW_TEXT: PQbinaryTuples() returns 1 and PQfformat() returns 1, but
used output function, not necessary client modification
4. PQcopyFormat() returns 0 for text, 1 for binary, 2 for RAW_TEXT, 3 for
RAW_BINARY
5. create tests for ecpg

Is it ok?

What do you prefer 3.a, or 3.b?

Regards

Pavel

>
> regards, tom lane
>

Re: [HACKERS] raw output from copy

2016-03-29 Thread Tom Lane

Pavel Stehule  writes:
> I tested COPY RAW on old psql clients - and it is working without any
> problem - so when the client uses same logic as psql, then it should to
> work. Sure, there can be differently implemented clients, but the COPY
> client side is usually simple - store stream to output.

My point is precisely that I doubt all clients are that stupid about COPY.

> Maybe I am blind, but I don't see any new security risks. The risk can be
> only on client side - and if client is not able work with new value, then
> it can fails.

Well, the point is that low-level code might get used to process the data
stream for commands it doesn't have any control over.  Maybe there's no
realistic security risk there, or maybe there is; I'm not sure.

> I am thinking so PQbinaryTuples should to return 1 (without change), and
> PQfformat should to return 2.

Well, that seems pretty backwards to me.  The format of the individual
fields is still what it is under COPY BINARY; you would not use a
different per-field transformation.  You do need to know about the
overall format of the copy data stream being different, and defining
PQbinaryTuples as still returning 1 means there's no clean way to
understand overall copy format vs. per-field format.

There's a case to be made that we should invent a new function named
along the lines of PQcopyFormat() rather than overloading PQbinaryTuples()
some more.  That function is currently deprecated and I'm not very happy
with un-deprecating it only to use it in a confusing way.

To be more concrete about this: I think it's actually rather broken
that this patch ties RAW to binary format of the field contents.
Why would it not be exactly as useful to have delimiter-less COPY
of textual data, for use when there's just one datum and/or you're
confident in picking the data apart for yourself?  But as things stand
it'd be too confusing for an application to try to figure out what's
happening in such a case.

So I think we should either invent RAW_TEXT and RAW_BINARY formats
(not just RAW) or make RAW be an orthogonal copy option.  And we need
to improve libpq's behavior enough so that applications can sanely
figure out what's happening.

> I executed all tests in libpq and ecpg without any problems. Can you,
> please, help me with repeating a ecpg issues?

Of course the ecpg tests pass; you didn't extend them to see what would
happen if someone tries COPY RAW with ecpg.   Likewise, we have no tests
exercising a client's use of libpq with more intelligence than psql has
got.  But that doesn't mean it's acceptable to write this patch with no
thought for such clients.

I am fairly sure that there actually are third-party client libraries
that have more intelligence about COPY than psql, but I do not remember
any specifics unfortunately.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-03-29 Thread Pavel Stehule

Hi

2016-03-29 0:26 GMT+02:00 Tom Lane :

> Pavel Stehule  writes:
> > [ copy-raw-format-20160227-03.patch ]
>
> I looked at this patch.  I'm having a hard time accepting that it has
> a use-case large enough to justify it, and here's the reason: it's
> a protocol break.  Conveniently omitting to update protocol.sgml
> doesn't make it not a protocol break.  (libpq.sgml also contains
> assorted statements that are falsified by this patch.)
>
> You could argue that it's the user's own fault if he tries to use
> COPY RAW with client-side code that hasn't been updated to support it.
> Maybe that's okay, but I wonder if we're opening ourselves up to
> problems.  Maybe even security-grade problems.
>

I tested COPY RAW on old psql clients - and it is working without any
problem - so when the client uses same logic as psql, then it should to
work. Sure, there can be differently implemented clients, but the COPY
client side is usually simple - store stream to output.

Maybe I am blind, but I don't see any new security risks. The risk can be
only on client side - and if client is not able work with new value, then
it can fails. But any attacker can use fake data stream, and can enforce
this error too. So if there are some security risks on special designed
clients, then this risks is existing now.

> In terms of specific code that hasn't been updated, ecpg is broken
> by this patch, and I'm not very sure what libpq's PQbinaryTuples()
> ought to do but probably something other than what it does today.
>
> There's also a definitional question of what we think PQfformat() ought
> to do; should it return "2" for the per-field format?  Or maybe the
> per-field format is still "1", since it's after all the same binary data
> format as for COPY BINARY, and only the overall copy format reported by
> PQbinaryTuples() should change to "2".
>

Theoretically the change there is allowed - "Format code zero indicates
textual data representation, while format code one indicates binary
representation. (Other codes are reserved for future definition.) -
PQfformat". But - the format of COPY RAW is binary - this format is cleaner
binary format than is used by COPY BINARY (where is a header + BINARY). I
am thinking so PQbinaryTuples should to return 1 (without change), and
PQfformat should to return 2. If some older client uses deprecated function
PQbinaryTuples(), then 1 is safe value. PQfformat() is documented
differently and if there will be different than expected value, then the
client should to raise a error. So using 2 is safe there. The value 2 is
adequate to actual content

Packet: t=1459265078.596466, session=213070643360702
PGSQL: type=Query, F -> B
QUERY query=copy foo(x) to stdout (format raw);

Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=CopyOutResponse, B -> F
COPY OUT RESPONSE copy format=1, num_fields=1, fields_formats=2

Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=CopyData, B -> F
COPY DATA len=20

Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=CopyDone, B -> F
COPY DONE

Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=CommandComplete, B -> F
COMMAND COMPLETE command='COPY 1'

Packet: t=1459265078.597755, session=213070643360702
PGSQL: type=ReadyForQuery, B -> F
READY FOR QUERY type=

What do you think ?

p.s. These values are returned now

PQfformat(*results, 0)) returns 2 already, PQbinaryTuples() returns 1.

> BTW, I'm not really sure why the patch is trying to enforce single
> row and column for the COPY OUT case.  I thought the idea for that
> was that we'd just shove out the data without any delimiters, and
> if it's more than one datum it's the user's problem whether he can
> identify the boundaries.  On the input side we would have to insist
> on one column since we're not going to attempt to identify boundaries
> (and one row would fall out of the fact that we slurp the entire input
> and treat it as one datum).
>
> Anyway this is certainly not committable as-is, so I'm setting it back
> to Waiting on Author.  But the fact that both libpq and ecpg would need
> updates makes me question whether we can safely pretend that this isn't
> a protocol break.
>

I executed all tests in libpq and ecpg without any problems. Can you,
please, help me with repeating a ecpg issues?

Regards

Pavel

>
> regards, tom lane
>

Re: [HACKERS] raw output from copy

2016-03-28 Thread Pavel Stehule

Hi


>>> Anyway this is certainly not committable as-is, so I'm setting it back
>>> to Waiting on Author.  But the fact that both libpq and ecpg would need
>>> updates makes me question whether we can safely pretend that this isn't
>>> a protocol break.
>>>
>>>
>>>
>>
>>
>> In that case I humbly submit that there is a case for reviving the psql
>> patch I posted that kicked off this whole thing and lets you export a piece
>> of binary data from psql quite easily. It should certainly not involve any
>> protocol break.
>>
>
> The psql only solution can work only for output. Doesn't help with input.
>

In this case, I am thinking so the features of COPY statement is perfect
for this feature. The way from a  content to the file is direct. In psql
you have to off - tuple separator, record separator, you have to set output
file. You can get same effect, but with more work. In previous version it
was relatively  hard to use it from command line - now, with multi command
-c is much simpler, but still the COPY is the ideal.

I agree, so output formats of psql is nice feature. And should be pretty
nice, if we support more common formats - like csv, simple xml, simple
json. I believe so sometime the redundancy is acceptable, if the cost is
not too high.

sorry for offtopic - I would to see some output format on client side, but
the format possibilities are on server side. So there are natural idea -
define server side output format. psql output format just can wrap it.

Regards

Pavel


> Regards
>
> Pavel
>
>
>>
>> cheers
>>
>> andrew
>>
>
>

Re: [HACKERS] raw output from copy

2016-03-28 Thread Pavel Stehule

Hi

2016-03-29 0:26 GMT+02:00 Tom Lane :

> Pavel Stehule  writes:
> > [ copy-raw-format-20160227-03.patch ]
>
> I looked at this patch.  I'm having a hard time accepting that it has
> a use-case large enough to justify it, and here's the reason: it's
> a protocol break.  Conveniently omitting to update protocol.sgml
> doesn't make it not a protocol break.  (libpq.sgml also contains
> assorted statements that are falsified by this patch.)
>

The reply on this question depends how we would to be strict. This doesn't
change the format in types stream, but it creates new enum value. Correctly
written should to raise exception when is processing unknown enum value.

I'll do tests against old libpq.


>
> You could argue that it's the user's own fault if he tries to use
> COPY RAW with client-side code that hasn't been updated to support it.
> Maybe that's okay, but I wonder if we're opening ourselves up to
> problems.  Maybe even security-grade problems.
>
> In terms of specific code that hasn't been updated, ecpg is broken
> by this patch, and I'm not very sure what libpq's PQbinaryTuples()
> ought to do but probably something other than what it does today.
>
> There's also a definitional question of what we think PQfformat() ought
> to do; should it return "2" for the per-field format?  Or maybe the
> per-field format is still "1", since it's after all the same binary data
> format as for COPY BINARY, and only the overall copy format reported by
> PQbinaryTuples() should change to "2".
>
>
I'll recheck it


> BTW, I'm not really sure why the patch is trying to enforce single
> row and column for the COPY OUT case.  I thought the idea for that
> was that we'd just shove out the data without any delimiters, and
> if it's more than one datum it's the user's problem whether he can
> identify the boundaries.  On the input side we would have to insist
> on one column since we're not going to attempt to identify boundaries
> (and one row would fall out of the fact that we slurp the entire input
> and treat it as one datum).
>

It should not be problem. I though about it. The COPY statements is
extensible with options. We can support more fields, more rows if it will
be required with additional options. But now, it looks like premature
optimization.


>
> Anyway this is certainly not committable as-is, so I'm setting it back
> to Waiting on Author.  But the fact that both libpq and ecpg would need
> updates makes me question whether we can safely pretend that this isn't
> a protocol break.
>

I'll do test against some clients.

Regards

Pavel


>
> regards, tom lane
>

Re: [HACKERS] raw output from copy

2016-03-28 Thread Pavel Stehule

2016-03-29 5:12 GMT+02:00 Andrew Dunstan :

>
>
> On 03/28/2016 06:26 PM, Tom Lane wrote:
>
>> Pavel Stehule  writes:
>>
>>> [ copy-raw-format-20160227-03.patch ]
>>>
>> I looked at this patch.  I'm having a hard time accepting that it has
>> a use-case large enough to justify it, and here's the reason: it's
>> a protocol break.  Conveniently omitting to update protocol.sgml
>> doesn't make it not a protocol break.  (libpq.sgml also contains
>> assorted statements that are falsified by this patch.)
>>
>> You could argue that it's the user's own fault if he tries to use
>> COPY RAW with client-side code that hasn't been updated to support it.
>> Maybe that's okay, but I wonder if we're opening ourselves up to
>> problems.  Maybe even security-grade problems.
>>
>> In terms of specific code that hasn't been updated, ecpg is broken
>> by this patch, and I'm not very sure what libpq's PQbinaryTuples()
>> ought to do but probably something other than what it does today.
>>
>> There's also a definitional question of what we think PQfformat() ought
>> to do; should it return "2" for the per-field format?  Or maybe the
>> per-field format is still "1", since it's after all the same binary data
>> format as for COPY BINARY, and only the overall copy format reported by
>> PQbinaryTuples() should change to "2".
>>
>> BTW, I'm not really sure why the patch is trying to enforce single
>> row and column for the COPY OUT case.  I thought the idea for that
>> was that we'd just shove out the data without any delimiters, and
>> if it's more than one datum it's the user's problem whether he can
>> identify the boundaries.  On the input side we would have to insist
>> on one column since we're not going to attempt to identify boundaries
>> (and one row would fall out of the fact that we slurp the entire input
>> and treat it as one datum).
>>
>> Anyway this is certainly not committable as-is, so I'm setting it back
>> to Waiting on Author.  But the fact that both libpq and ecpg would need
>> updates makes me question whether we can safely pretend that this isn't
>> a protocol break.
>>
>>
>>
>
>
> In that case I humbly submit that there is a case for reviving the psql
> patch I posted that kicked off this whole thing and lets you export a piece
> of binary data from psql quite easily. It should certainly not involve any
> protocol break.
>

The psql only solution can work only for output. Doesn't help with input.

Regards

Pavel


>
> cheers
>
> andrew
>

Re: [HACKERS] raw output from copy

2016-03-28 Thread Andrew Dunstan




On 03/28/2016 06:26 PM, Tom Lane wrote:

Pavel Stehule  writes:

[ copy-raw-format-20160227-03.patch ]

I looked at this patch.  I'm having a hard time accepting that it has
a use-case large enough to justify it, and here's the reason: it's
a protocol break.  Conveniently omitting to update protocol.sgml
doesn't make it not a protocol break.  (libpq.sgml also contains
assorted statements that are falsified by this patch.)

You could argue that it's the user's own fault if he tries to use
COPY RAW with client-side code that hasn't been updated to support it.
Maybe that's okay, but I wonder if we're opening ourselves up to
problems.  Maybe even security-grade problems.

In terms of specific code that hasn't been updated, ecpg is broken
by this patch, and I'm not very sure what libpq's PQbinaryTuples()
ought to do but probably something other than what it does today.

There's also a definitional question of what we think PQfformat() ought
to do; should it return "2" for the per-field format?  Or maybe the
per-field format is still "1", since it's after all the same binary data
format as for COPY BINARY, and only the overall copy format reported by
PQbinaryTuples() should change to "2".

BTW, I'm not really sure why the patch is trying to enforce single
row and column for the COPY OUT case.  I thought the idea for that
was that we'd just shove out the data without any delimiters, and
if it's more than one datum it's the user's problem whether he can
identify the boundaries.  On the input side we would have to insist
on one column since we're not going to attempt to identify boundaries
(and one row would fall out of the fact that we slurp the entire input
and treat it as one datum).

Anyway this is certainly not committable as-is, so I'm setting it back
to Waiting on Author.  But the fact that both libpq and ecpg would need
updates makes me question whether we can safely pretend that this isn't
a protocol break.





In that case I humbly submit that there is a case for reviving the psql 
patch I posted that kicked off this whole thing and lets you export a 
piece of binary data from psql quite easily. It should certainly not 
involve any protocol break.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-03-28 Thread Tom Lane

Pavel Stehule  writes:
> [ copy-raw-format-20160227-03.patch ]

I looked at this patch.  I'm having a hard time accepting that it has
a use-case large enough to justify it, and here's the reason: it's
a protocol break.  Conveniently omitting to update protocol.sgml
doesn't make it not a protocol break.  (libpq.sgml also contains
assorted statements that are falsified by this patch.)

You could argue that it's the user's own fault if he tries to use
COPY RAW with client-side code that hasn't been updated to support it.
Maybe that's okay, but I wonder if we're opening ourselves up to
problems.  Maybe even security-grade problems.

In terms of specific code that hasn't been updated, ecpg is broken
by this patch, and I'm not very sure what libpq's PQbinaryTuples()
ought to do but probably something other than what it does today.

There's also a definitional question of what we think PQfformat() ought
to do; should it return "2" for the per-field format?  Or maybe the
per-field format is still "1", since it's after all the same binary data
format as for COPY BINARY, and only the overall copy format reported by
PQbinaryTuples() should change to "2".

BTW, I'm not really sure why the patch is trying to enforce single
row and column for the COPY OUT case.  I thought the idea for that
was that we'd just shove out the data without any delimiters, and
if it's more than one datum it's the user's problem whether he can
identify the boundaries.  On the input side we would have to insist
on one column since we're not going to attempt to identify boundaries
(and one row would fall out of the fact that we slurp the entire input
and treat it as one datum).

Anyway this is certainly not committable as-is, so I'm setting it back
to Waiting on Author.  But the fact that both libpq and ecpg would need
updates makes me question whether we can safely pretend that this isn't
a protocol break.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-03-19 Thread David Steele


On 3/12/16 1:24 AM, Pavel Stehule wrote:


Great, thank you very much. I hope so this feature really useful. It
allow to simple export/import XML doc in different encodings, JSONs and
can be enhanced future via options. The nice feature  (but not for this
release) can be additional cast info for import -- like "COPY
table(jsonb_column) FROM stdin (FORMAT RAW, CAST json_2_jsonb). Because
there are the options, there are big space for other enhancing.


Andres Karlsson pointed out that this patch has two CF entries:

https://commitfest.postgresql.org/9/223/
https://commitfest.postgresql.org/9/547/

I closed the one that was in the "needs review" (547) state and kept the 
one that is "ready for committer" (223).


--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-03-19 Thread Pavel Stehule

2016-03-18 16:32 GMT+01:00 David Steele :

> On 3/12/16 1:24 AM, Pavel Stehule wrote:
>
> Great, thank you very much. I hope so this feature really useful. It
>> allow to simple export/import XML doc in different encodings, JSONs and
>> can be enhanced future via options. The nice feature  (but not for this
>> release) can be additional cast info for import -- like "COPY
>> table(jsonb_column) FROM stdin (FORMAT RAW, CAST json_2_jsonb). Because
>> there are the options, there are big space for other enhancing.
>>
>
> Andres Karlsson pointed out that this patch has two CF entries:
>
> https://commitfest.postgresql.org/9/223/
> https://commitfest.postgresql.org/9/547/
>
> I closed the one that was in the "needs review" (547) state and kept the
> one that is "ready for committer" (223).
>

sure, thank you

Pavel


>
> --
> -David
> da...@pgmasters.net
>

Re: [HACKERS] raw output from copy

2016-03-11 Thread Pavel Stehule

Hi

2016-03-09 18:41 GMT+01:00 Corey Huinker :

>
>>> The regression tests seem to adequately cover all new functionality,
>>> though I wonder if we should add some cases that highlight situations where
>>> BINARY mode is insufficient.
>>>
>>>
> One thing I tried to test RAW was to load an existing json file.
>
> My own personal test was to load an existing .json file into a 1x1 bytea
> table, which worked. From there I was able to
> select encode(col_name,'escape')::text::jsonb from test_table
> and the json was correctly converted.
>
> A similar test copying binary failed.
>
> A write up of the test looks like this:
>
>
> \copy (select '{"foo": "bar"}') to '/tmp/raw_test.jsonb' (format raw);
> COPY 1
> create temporary table raw_byte (b bytea);
> CREATE TABLE
> create temporary table raw_text (t text);
> CREATE TABLE
> \copy raw_jsonb from '/tmp/raw_test.blob' (format raw);
> psql:/home/ubuntu/raw_test.sql:9: ERROR:  relation "raw_jsonb" does not
> exist
> \copy raw_byte from '/tmp/raw_test.blob' (format raw);
> COPY 1
> select encode(b,'escape')::text::json from raw_byte;
>  encode
> 
>  {"foo": "bar"}
> (1 row)
>
> \copy raw_text from '/tmp/raw_test.blob' (format raw);
> COPY 1
> select t::jsonb from raw_text;
>t
> 
>  {"foo": "bar"}
> (1 row)
>
> create temporary table binary_byte (b bytea);
> CREATE TABLE
> create temporary table binary_text (t text);
> CREATE TABLE
> \copy binary_byte from '/tmp/raw_test.blob' (format binary);
> psql:/home/ubuntu/raw_test.sql:22: ERROR:  COPY file signature not
> recognized
> select encode(b,'escape')::jsonb from binary_byte;
>  encode
> 
> (0 rows)
>
> \copy binary_text from '/tmp/raw_test.blob' (format binary);
> psql:/home/ubuntu/raw_test.sql:26: ERROR:  COPY file signature not
> recognized
> select t::jsonb from binary_text;
>  t
> ---
> (0 rows)
>
>
> So, *if* we want to add a regression test to demonstrate to posterity why
> we need RAW for cases that BINARY can't handle, I offer the attached file.
>

I don't think so regress tests should to do this demonstration. It is
clean, so COPY BINARY should to fail every time, and then there is not any
benefit from it in regress tests. There are lot of discussion in this
thread, and we don't need to inject more "garbage" to regress tests.


>
> Does anyone else see value in adding that to the regression tests?
>

>
>
>> Before I give my approval, I want to read it again more closely to make
>>> sure that no cases were skipped with regard to the  (binary || raw) and
>>> (binary || !raw) tests. Also, I want to use it on some of my problematic
>>> files. Maybe I'll find a good edge case. Probably not.
>>>
>>
> I don't know why I thought this, but when I looked at the patch, I assumed
> that the ( binary || raw ) tests were part of a large if/elseif/else
> waterfall. They are not. They stand alone. There are no edge cases to find.
>

This is organized to files by necessity to work with external files. The
regress tests for COPY RAW has about 100 lines - so why need special files
and infrastructure. COPY RAW, COPY BINARY tests well shares infrastructure.


>
> Review complete and passed. I can re-review if we want to add the
> additional test.
>
>
Great, thank you very much. I hope so this feature really useful. It allow
to simple export/import XML doc in different encodings, JSONs and can be
enhanced future via options. The nice feature  (but not for this release)
can be additional cast info for import -- like "COPY table(jsonb_column)
FROM stdin (FORMAT RAW, CAST json_2_jsonb). Because there are the options,
there are big space for other enhancing.

Regards

Pavel

Re: [HACKERS] raw output from copy

2016-03-09 Thread Corey Huinker

>
>
>> The regression tests seem to adequately cover all new functionality,
>> though I wonder if we should add some cases that highlight situations where
>> BINARY mode is insufficient.
>>
>>
One thing I tried to test RAW was to load an existing json file.

My own personal test was to load an existing .json file into a 1x1 bytea
table, which worked. From there I was able to
select encode(col_name,'escape')::text::jsonb from test_table
and the json was correctly converted.

A similar test copying binary failed.

A write up of the test looks like this:


\copy (select '{"foo": "bar"}') to '/tmp/raw_test.jsonb' (format raw);
COPY 1
create temporary table raw_byte (b bytea);
CREATE TABLE
create temporary table raw_text (t text);
CREATE TABLE
\copy raw_jsonb from '/tmp/raw_test.blob' (format raw);
psql:/home/ubuntu/raw_test.sql:9: ERROR:  relation "raw_jsonb" does not
exist
\copy raw_byte from '/tmp/raw_test.blob' (format raw);
COPY 1
select encode(b,'escape')::text::json from raw_byte;
 encode

 {"foo": "bar"}
(1 row)

\copy raw_text from '/tmp/raw_test.blob' (format raw);
COPY 1
select t::jsonb from raw_text;
   t

 {"foo": "bar"}
(1 row)

create temporary table binary_byte (b bytea);
CREATE TABLE
create temporary table binary_text (t text);
CREATE TABLE
\copy binary_byte from '/tmp/raw_test.blob' (format binary);
psql:/home/ubuntu/raw_test.sql:22: ERROR:  COPY file signature not
recognized
select encode(b,'escape')::jsonb from binary_byte;
 encode

(0 rows)

\copy binary_text from '/tmp/raw_test.blob' (format binary);
psql:/home/ubuntu/raw_test.sql:26: ERROR:  COPY file signature not
recognized
select t::jsonb from binary_text;
 t
---
(0 rows)


So, *if* we want to add a regression test to demonstrate to posterity why
we need RAW for cases that BINARY can't handle, I offer the attached file.

Does anyone else see value in adding that to the regression tests?



> Before I give my approval, I want to read it again more closely to make
>> sure that no cases were skipped with regard to the  (binary || raw) and
>> (binary || !raw) tests. Also, I want to use it on some of my problematic
>> files. Maybe I'll find a good edge case. Probably not.
>>
>
I don't know why I thought this, but when I looked at the patch, I assumed
that the ( binary || raw ) tests were part of a large if/elseif/else
waterfall. They are not. They stand alone. There are no edge cases to find.

Review complete and passed. I can re-review if we want to add the
additional test.


raw_test.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-03-08 Thread Pavel Stehule

2016-03-04 18:06 GMT+01:00 Pavel Stehule :

>
>
> 2016-03-04 15:54 GMT+01:00 Daniel Verite :
>
>> Corey Huinker wrote:
>>
>> > So, for me, RAW is the right solution, or at least *a* right solution.
>>
>> Questions on how to extract from a bytea column come up on a regular
>> basis, as in [1] [2] [3], or [4] a few days ago, and so far the answers
>> are to encode the contents in text and decode them in an additional
>> step, or use COPY BINARY and filter out the headers.
>>
>> But none of this is as straightforward and efficient as the proposed
>> COPY RAW.
>> Also the conversion to text can't be used at all on very large
>> contents (>512MB), as mentioned in another recent thread [5]
>> (this is the same reason why pg_dump can't dump such rows),
>> but COPY RAW doesn't have this limitation.
>>
>> Technically COPY BINARY should be sufficient, but it seems that
>> people dislike having to deal with its headers.
>>
> Also it's not supported by any of the drivers of popular
>> script languages that otherwise provide COPY in text format
>> (DBD::Pg, php, psycopg2...)
>> Maybe the RAW format would have a better chance to get support
>> there, because of its simplicity.
>>
>
> exactly - I would to decrease dependency on PostgreSQL internals. Working
> with clean content is simple and possible with any environment without
> unclean operations.
>

COPY RAW can be used for import. I am not sure if this use case was tested.

cat image.jpg | psql -c "CREATE TEMP TABLE auxbuf(image bytea); COPY
auxbuf(image) FROM stdin RAW; ..." postgres

Regards

Pavel


> Regards
>
> Pavel
>
>
>>
>> [1]
>>
>> http://www.postgresql.org/message-id/038517CEB6DE43BD8422D7947B6BE8D8@fanliji
>> ng
>>
>> [2] http://www.postgresql.org/message-id/4c8272c4.1000...@arcor.de
>>
>> [3] http://stackoverflow.com/questions/6730729
>>
>> [4]
>> http://www.postgresql.org/message-id/56c66565.50...@consistentstate.com
>>
>> [5] http://www.postgresql.org/message-id/14620.1456851...@sss.pgh.pa.us
>>
>>
>> Best regards,
>> --
>> Daniel Vérité
>> PostgreSQL-powered mailer: http://www.manitou-mail.org
>> Twitter: @DanielVerite
>>
>
>

Re: [HACKERS] raw output from copy

2016-03-04 Thread Pavel Stehule

2016-03-04 15:54 GMT+01:00 Daniel Verite :

> Corey Huinker wrote:
>
> > So, for me, RAW is the right solution, or at least *a* right solution.
>
> Questions on how to extract from a bytea column come up on a regular
> basis, as in [1] [2] [3], or [4] a few days ago, and so far the answers
> are to encode the contents in text and decode them in an additional
> step, or use COPY BINARY and filter out the headers.
>
> But none of this is as straightforward and efficient as the proposed
> COPY RAW.
> Also the conversion to text can't be used at all on very large
> contents (>512MB), as mentioned in another recent thread [5]
> (this is the same reason why pg_dump can't dump such rows),
> but COPY RAW doesn't have this limitation.
>
> Technically COPY BINARY should be sufficient, but it seems that
> people dislike having to deal with its headers.
>
Also it's not supported by any of the drivers of popular
> script languages that otherwise provide COPY in text format
> (DBD::Pg, php, psycopg2...)
> Maybe the RAW format would have a better chance to get support
> there, because of its simplicity.
>

exactly - I would to decrease dependency on PostgreSQL internals. Working
with clean content is simple and possible with any environment without
unclean operations.

Regards

Pavel


>
> [1]
>
> http://www.postgresql.org/message-id/038517CEB6DE43BD8422D7947B6BE8D8@fanliji
> ng
>
> [2] http://www.postgresql.org/message-id/4c8272c4.1000...@arcor.de
>
> [3] http://stackoverflow.com/questions/6730729
>
> [4]
> http://www.postgresql.org/message-id/56c66565.50...@consistentstate.com
>
> [5] http://www.postgresql.org/message-id/14620.1456851...@sss.pgh.pa.us
>
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite
>

Re: [HACKERS] raw output from copy

2016-03-04 Thread Daniel Verite

Corey Huinker wrote:

> So, for me, RAW is the right solution, or at least *a* right solution.

Questions on how to extract from a bytea column come up on a regular
basis, as in [1] [2] [3], or [4] a few days ago, and so far the answers
are to encode the contents in text and decode them in an additional
step, or use COPY BINARY and filter out the headers.

But none of this is as straightforward and efficient as the proposed
COPY RAW.
Also the conversion to text can't be used at all on very large
contents (>512MB), as mentioned in another recent thread [5]
(this is the same reason why pg_dump can't dump such rows),
but COPY RAW doesn't have this limitation.

Technically COPY BINARY should be sufficient, but it seems that
people dislike having to deal with its headers.
Also it's not supported by any of the drivers of popular
script languages that otherwise provide COPY in text format
(DBD::Pg, php, psycopg2...)
Maybe the RAW format would have a better chance to get support
there, because of its simplicity.

[1]
http://www.postgresql.org/message-id/038517CEB6DE43BD8422D7947B6BE8D8@fanliji
ng

[2] http://www.postgresql.org/message-id/4c8272c4.1000...@arcor.de

[3] http://stackoverflow.com/questions/6730729

[4] http://www.postgresql.org/message-id/56c66565.50...@consistentstate.com

[5] http://www.postgresql.org/message-id/14620.1456851...@sss.pgh.pa.us


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2016-03-04 Thread Ildar Musin


Hi Pavel

27/02/16 10:26, Pavel Stehule пишет:

Hi

2015-08-06 10:37 GMT+02:00 Pavel Stehule >:


Hi,

Psql based implementation needs new infrastructure (more than few
lines)

Missing:

* binary mode support
* parametrized query support,

I am not against, but both points I proposed, and both was rejected.

So why dont use current infrastructure? Raw copy is trivial patch.


I was asked by Daniel Verite about reopening this patch in opened 
commitfest.


I am sending rebased patch

Regards

Pavel


I am new to reviewing, here is what I got. Patch have been applied 
nicely to the HEAD. I tried to upload and export files in psql, it works 
as expected. All regression tests are passed without problems as well. 
Code looks good for me. There is a little confusion for me in this line 
of documentation:


"use any metadata - only row data in network byte order are exported"

Did you mean "only raw data in network byte order is exported"?

And there are two entries for this patch on commitfest page: in 
"miscellaneous" and "sql" sections. Probably it's better to remove one 
of them to avoid confusion.


--
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: [HACKERS] raw output from copy

2016-03-03 Thread Pavel Stehule

2016-03-04 3:13 GMT+01:00 Corey Huinker :

> On Sat, Feb 27, 2016 at 2:26 AM, Pavel Stehule 
> wrote:
>
>> Hi
>>
>> 2015-08-06 10:37 GMT+02:00 Pavel Stehule :
>>
>>> Hi,
>>>
>>> Psql based implementation needs new infrastructure (more than few lines)
>>>
>>> Missing:
>>>
>>> * binary mode support
>>> * parametrized query support,
>>>
>>> I am not against, but both points I proposed, and both was rejected.
>>>
>>> So why dont use current infrastructure? Raw copy is trivial patch.
>>>
>>
>> I was asked by Daniel Verite about reopening this patch in opened
>> commitfest.
>>
>> I am sending rebased patch
>>
>> Regards
>>
>> Pavel
>>
>>
>>
> Since this patch does something I need for my own work, I've signed up as
> a reviewer.
>
> From a design standpoint, I feel that COPY is the preferred means of
> dealing with data from sources too transient to justify setting up a
> foreign data wrapper, and too simple to justify writing application code.
> So, for me, RAW is the right solution, or at least *a* right solution.
>

my opinion is same - there all necessary infrastructure is ready and when
we work with IO, then we use COPY natively. I hope so main use case (export
bytea) is solved, but there are a possibility to enhance this command by
COPY options - what is, I am thinking, a advantage of this way.


>
> My first pass of reading the code changes and the regression tests is
> complete, and I found the changes to be clear and fairly straightforward.
> This shouldn't surprise anyone, as the previous reviewers had only minor
> quibbles with the code. So far, so good.
>
> The regression tests seem to adequately cover all new functionality,
> though I wonder if we should add some cases that highlight situations where
> BINARY mode is insufficient.
>
> Before I give my approval, I want to read it again more closely to make
> sure that no cases were skipped with regard to the  (binary || raw) and
> (binary || !raw) tests. Also, I want to use it on some of my problematic
> files. Maybe I'll find a good edge case. Probably not.
>
> I hope to find time for those things in the next few days.
>

Thank you very much

Regards

Pavel

Re: [HACKERS] raw output from copy

2016-03-03 Thread Corey Huinker

On Sat, Feb 27, 2016 at 2:26 AM, Pavel Stehule 
wrote:

> Hi
>
> 2015-08-06 10:37 GMT+02:00 Pavel Stehule :
>
>> Hi,
>>
>> Psql based implementation needs new infrastructure (more than few lines)
>>
>> Missing:
>>
>> * binary mode support
>> * parametrized query support,
>>
>> I am not against, but both points I proposed, and both was rejected.
>>
>> So why dont use current infrastructure? Raw copy is trivial patch.
>>
>
> I was asked by Daniel Verite about reopening this patch in opened
> commitfest.
>
> I am sending rebased patch
>
> Regards
>
> Pavel
>
>
>
Since this patch does something I need for my own work, I've signed up as a
reviewer.

>From a design standpoint, I feel that COPY is the preferred means of
dealing with data from sources too transient to justify setting up a
foreign data wrapper, and too simple to justify writing application code.
So, for me, RAW is the right solution, or at least *a* right solution.

My first pass of reading the code changes and the regression tests is
complete, and I found the changes to be clear and fairly straightforward.
This shouldn't surprise anyone, as the previous reviewers had only minor
quibbles with the code. So far, so good.

The regression tests seem to adequately cover all new functionality, though
I wonder if we should add some cases that highlight situations where BINARY
mode is insufficient.

Before I give my approval, I want to read it again more closely to make
sure that no cases were skipped with regard to the  (binary || raw) and
(binary || !raw) tests. Also, I want to use it on some of my problematic
files. Maybe I'll find a good edge case. Probably not.

I hope to find time for those things in the next few days.

Re: [HACKERS] raw output from copy

2016-02-26 Thread Pavel Stehule

Hi

2015-08-06 10:37 GMT+02:00 Pavel Stehule :

> Hi,
>
> Psql based implementation needs new infrastructure (more than few lines)
>
> Missing:
>
> * binary mode support
> * parametrized query support,
>
> I am not against, but both points I proposed, and both was rejected.
>
> So why dont use current infrastructure? Raw copy is trivial patch.
>

I was asked by Daniel Verite about reopening this patch in opened
commitfest.

I am sending rebased patch

Regards

Pavel
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
new file mode 100644
index 07e2f45..68fbfd8
*** a/doc/src/sgml/ref/copy.sgml
--- b/doc/src/sgml/ref/copy.sgml
*** COPY { ta
*** 197,203 
Selects the data format to be read or written:
text,
csv (Comma Separated Values),
!   or binary.
The default is text.
   
  
--- 197,203 
Selects the data format to be read or written:
text,
csv (Comma Separated Values),
!   binary or raw.
The default is text.
   
  
*** OIDs to be shown as null if that ever pr
*** 888,893 
--- 888,925 
  
 

+ 
+   
+  Raw Format
+ 
+
+ The raw format option causes all data to be
+ stored/read as binary format rather than as text. It shares format
+ for data with binary format. This format doesn't
+ use any metadata - only row data in network byte order are exported
+ or imported.
+
+ 
+
+ Because this format doesn't support any delimiter, only one value
+ can be exported or imported. NULL values are not allowed.
+
+
+ The raw format can be used for export or import
+ bytea values.
+ 
+ COPY images(data) FROM '/usr1/proj/img/01.jpg' (FORMAT raw);
+ 
+ It can be used successfully for export XML in different encoding
+ or import valid XML document with any supported encoding:
+ 
+
+   
   
  
   
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
new file mode 100644
index 3201476..beb9152
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*** typedef enum EolType
*** 89,94 
--- 89,99 
   * it's faster to make useless comparisons to trailing bytes than it is to
   * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is TRUE
   * when we have to do it the hard way.
+  *
+  * COPY supports three modes: text, binary and raw. The text format is plain
+  * text multiline format with specified delimiter. The binary format holds
+  * metadata (numbers, sizes) and data. The raw format holds data only and
+  * only one non NULL value can be processed.
   */
  typedef struct CopyStateData
  {
*** typedef struct CopyStateData
*** 110,115 
--- 115,121 
  	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
  	bool		is_program;		/* is 'filename' a program to popen? */
  	bool		binary;			/* binary format? */
+ 	bool		raw;			/* required raw binary? */
  	bool		oids;			/* include OIDs? */
  	bool		freeze;			/* freeze rows on loading? */
  	bool		csv_mode;		/* Comma Separated Value format? */
*** typedef struct CopyStateData
*** 199,204 
--- 205,213 
  	char	   *raw_buf;
  	int			raw_buf_index;	/* next byte to process */
  	int			raw_buf_len;	/* total # of bytes stored */
+ 
+ 	/* field for RAW mode */
+ 	bool		row_processed;		/* true, when first row was processed */
  } CopyStateData;
  
  /* DestReceiver for COPY (query) TO */
*** SendCopyBegin(CopyState cstate)
*** 342,350 
  		/* new way */
  		StringInfoData buf;
  		int			natts = list_length(cstate->attnumlist);
! 		int16		format = (cstate->binary ? 1 : 0);
  		int			i;
  
  		pq_beginmessage(, 'H');
  		pq_sendbyte(, format);		/* overall format */
  		pq_sendint(, natts, 2);
--- 351,366 
  		/* new way */
  		StringInfoData buf;
  		int			natts = list_length(cstate->attnumlist);
! 		int16		format;
  		int			i;
  
+ 		if (cstate->raw)
+ 			format = 2;
+ 		else if (cstate->binary)
+ 			format = 1;
+ 		else
+ 			format = 0;
+ 
  		pq_beginmessage(, 'H');
  		pq_sendbyte(, format);		/* overall format */
  		pq_sendint(, natts, 2);
*** SendCopyBegin(CopyState cstate)
*** 356,362 
  	else if (PG_PROTOCOL_MAJOR(FrontendProtocol) >= 2)
  	{
  		/* old way */
! 		if (cstate->binary)
  			ereport(ERROR,
  	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  			errmsg("COPY BINARY is not supported to stdout or from stdin")));
--- 372,378 
  	else if (PG_PROTOCOL_MAJOR(FrontendProtocol) >= 2)
  	{
  		/* old way */
! 		if (cstate->binary && cstate->raw)
  			ereport(ERROR,
  	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  			errmsg("COPY BINARY is not supported to stdout or from stdin")));
*** SendCopyBegin(CopyState cstate)
*** 368,374 
  	else
  	{
  		/* very old way */
! 		if (cstate->binary)
  			ereport(ERROR,
  	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),

Re: [HACKERS] raw output from copy

2015-08-06 Thread Pavel Stehule

Hi,

Psql based implementation needs new infrastructure (more than few lines)

Missing:

* binary mode support
* parametrized query support,

I am not against, but both points I proposed, and both was rejected.

So why dont use current infrastructure? Raw copy is trivial patch.
Dne 6.8.2015 0:09 napsal uživatel Andrew Dunstan and...@dunslane.net:


 On 08/05/2015 04:59 PM, Heikki Linnakangas wrote:

 On 07/27/2015 02:28 PM, Pavel Stehule wrote:

 2015-07-27 10:41 GMT+02:00 Heikki Linnakangas hlinn...@iki.fi:

 What about input? This is a whole new feature, but it would be nice to be
 able to pass the file contents as a query parameter. Something like:

 \P /tmp/foo binary
 INSERT INTO foo VALUES (?);


 The example of input is strong reason, why don't do it via inserts. Only
 parsing some special ? symbol needs lot of new code.


 Sorry, I meant $1 in place of the ?. No special parsing needed, psql can
 send the query to the server as is, with the parameters that are given by
 this new mechanism.

 In this case, I don't see any advantage of  psql based solution. COPY is
 standard interface for input/output from/to files, and it should be used
 there.


 I'm not too happy with the COPY approach, although I won't object is one
 of the other committers feel more comfortable with it. However, we don't
 seem to be making progress here, so I'm going to mark this as Returned with
 Feedback. I don't feel good about that either, because I don't actually
 have any great suggestions on how to move this forward. Which is a pity
 because this is a genuine problem for users.



 This is really only a psql problem, IMNSHO. Inserting and extracting
 binary data is pretty trivial for most users of client libraries (e.g. it's
 a couple of lines of code in a DBD::Pg program), but it's hard in psql.

 I do agree that the COPY approach feels more than a little klunky.

 cheers

 andrew

Re: [HACKERS] raw output from copy

2015-08-05 Thread Heikki Linnakangas


On 07/27/2015 02:28 PM, Pavel Stehule wrote:

2015-07-27 10:41 GMT+02:00 Heikki Linnakangas hlinn...@iki.fi:


What about input? This is a whole new feature, but it would be nice to be
able to pass the file contents as a query parameter. Something like:

\P /tmp/foo binary
INSERT INTO foo VALUES (?);


The example of input is strong reason, why don't do it via inserts. Only
parsing some special ? symbol needs lot of new code.


Sorry, I meant $1 in place of the ?. No special parsing needed, psql can 
send the query to the server as is, with the parameters that are given 
by this new mechanism.



In this case, I don't see any advantage of  psql based solution. COPY is
standard interface for input/output from/to files, and it should be used
there.


I'm not too happy with the COPY approach, although I won't object is one 
of the other committers feel more comfortable with it. However, we don't 
seem to be making progress here, so I'm going to mark this as Returned 
with Feedback. I don't feel good about that either, because I don't 
actually have any great suggestions on how to move this forward. Which 
is a pity because this is a genuine problem for users.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2015-08-05 Thread Andrew Dunstan



On 08/05/2015 04:59 PM, Heikki Linnakangas wrote:

On 07/27/2015 02:28 PM, Pavel Stehule wrote:

2015-07-27 10:41 GMT+02:00 Heikki Linnakangas hlinn...@iki.fi:

What about input? This is a whole new feature, but it would be nice 
to be

able to pass the file contents as a query parameter. Something like:

\P /tmp/foo binary
INSERT INTO foo VALUES (?);


The example of input is strong reason, why don't do it via inserts. Only
parsing some special ? symbol needs lot of new code.


Sorry, I meant $1 in place of the ?. No special parsing needed, psql 
can send the query to the server as is, with the parameters that are 
given by this new mechanism.



In this case, I don't see any advantage of  psql based solution. COPY is
standard interface for input/output from/to files, and it should be used
there.


I'm not too happy with the COPY approach, although I won't object is 
one of the other committers feel more comfortable with it. However, we 
don't seem to be making progress here, so I'm going to mark this as 
Returned with Feedback. I don't feel good about that either, because I 
don't actually have any great suggestions on how to move this forward. 
Which is a pity because this is a genuine problem for users.





This is really only a psql problem, IMNSHO. Inserting and extracting 
binary data is pretty trivial for most users of client libraries (e.g. 
it's a couple of lines of code in a DBD::Pg program), but it's hard in psql.


I do agree that the COPY approach feels more than a little klunky.

cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2015-07-27 Thread Pavel Stehule

2015-07-27 10:41 GMT+02:00 Heikki Linnakangas hlinn...@iki.fi:

 On 07/27/2015 06:55 AM, Craig Ringer wrote:

 On 7 July 2015 at 14:32, Pavel Stehule pavel.steh...@gmail.com wrote:

 Hi

 previous patch was broken, and buggy

 Here is new version with fixed upload and more tests


 I routinely see people trying to use COPY ... FORMAT binary to export
 a single binary field (like an image, for example) and getting
 confused by the header PostgreSQL adds. Or using text-format COPY and
 struggling with the hex escaping. It's clearly something people have
 trouble with.

 It doesn't help that while lo_import and lo_export can read paths
 outside the datadir (and refuse to read from within it),
 pg_read_binary_file is superuser only and disallows absolute paths.
 There's no corresponding pg_write_binary_file. So users who want to
 import and export a single binary field tend to try to use COPY. We
 have functionality for large objects that has no equivalent for
 'bytea'.

 I don't love the use of COPY for this, but it gets us support for
 arbitrary clients pretty easily. Otherwise it'd be server-side only
 via local filesystem access, or require special psql-specific
 functionality like we have for lo_import etc.


 COPY seems like a strange interface for this. I can see the point that the
 syntax is almost there already, for both input and output. But even that's
 not quite there yet, we'd need the new RAW format. And as an input method,
 COPY is a bit awkward, because you cannot easily pass the file to a
 function, for example. I think this should be implemented in psql, along
 the lines of Andrew's original \bcopy patch.

 There are a couple of related psql-features here actually, that would be
 useful on their own. The first is being able to send the query result to a
 file, for a single query only. You can currently do:

 \o /tmp/foo
 SELECT ...;
 \o

 But more often than not, when I try to do that, I forget to do the last
 \o, and run another query, and the output still goes to the file. So it'd
 be nice to have a \o option that only affects the next query. Something
 like:

 \O /tmp/foo
 SELECT ...;

 The second feature needed is to write the output without any headers, row
 delimiters and such. Just the datum. And the third feature is to write it
 in binary. Perhaps something like:

 \O /tmp/foo binary
 SELECT blob FROM foo WHERE id = 10;

 What about input? This is a whole new feature, but it would be nice to be
 able to pass the file contents as a query parameter. Something like:

 \P /tmp/foo binary
 INSERT INTO foo VALUES (?);


The example of input is strong reason, why don't do it via inserts. Only
parsing some special ? symbol needs lot of new code.

In this case, I don't see any advantage of  psql based solution. COPY is
standard interface for input/output from/to files, and it should be used
there.

Regards

Pavel




 - Heikki

Re: [HACKERS] raw output from copy

2015-07-27 Thread Heikki Linnakangas


On 07/27/2015 06:55 AM, Craig Ringer wrote:

On 7 July 2015 at 14:32, Pavel Stehule pavel.steh...@gmail.com wrote:

Hi

previous patch was broken, and buggy

Here is new version with fixed upload and more tests


I routinely see people trying to use COPY ... FORMAT binary to export
a single binary field (like an image, for example) and getting
confused by the header PostgreSQL adds. Or using text-format COPY and
struggling with the hex escaping. It's clearly something people have
trouble with.

It doesn't help that while lo_import and lo_export can read paths
outside the datadir (and refuse to read from within it),
pg_read_binary_file is superuser only and disallows absolute paths.
There's no corresponding pg_write_binary_file. So users who want to
import and export a single binary field tend to try to use COPY. We
have functionality for large objects that has no equivalent for
'bytea'.

I don't love the use of COPY for this, but it gets us support for
arbitrary clients pretty easily. Otherwise it'd be server-side only
via local filesystem access, or require special psql-specific
functionality like we have for lo_import etc.


COPY seems like a strange interface for this. I can see the point that 
the syntax is almost there already, for both input and output. But even 
that's not quite there yet, we'd need the new RAW format. And as an 
input method, COPY is a bit awkward, because you cannot easily pass the 
file to a function, for example. I think this should be implemented in 
psql, along the lines of Andrew's original \bcopy patch.


There are a couple of related psql-features here actually, that would be 
useful on their own. The first is being able to send the query result to 
a file, for a single query only. You can currently do:


\o /tmp/foo
SELECT ...;
\o

But more often than not, when I try to do that, I forget to do the last 
\o, and run another query, and the output still goes to the file. So 
it'd be nice to have a \o option that only affects the next query. 
Something like:


\O /tmp/foo
SELECT ...;

The second feature needed is to write the output without any headers, 
row delimiters and such. Just the datum. And the third feature is to 
write it in binary. Perhaps something like:


\O /tmp/foo binary
SELECT blob FROM foo WHERE id = 10;

What about input? This is a whole new feature, but it would be nice to 
be able to pass the file contents as a query parameter. Something like:


\P /tmp/foo binary
INSERT INTO foo VALUES (?);


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2015-07-26 Thread Craig Ringer

On 7 July 2015 at 14:32, Pavel Stehule pavel.steh...@gmail.com wrote:
 Hi

 previous patch was broken, and buggy

 Here is new version with fixed upload and more tests

I routinely see people trying to use COPY ... FORMAT binary to export
a single binary field (like an image, for example) and getting
confused by the header PostgreSQL adds. Or using text-format COPY and
struggling with the hex escaping. It's clearly something people have
trouble with.

It doesn't help that while lo_import and lo_export can read paths
outside the datadir (and refuse to read from within it),
pg_read_binary_file is superuser only and disallows absolute paths.
There's no corresponding pg_write_binary_file. So users who want to
import and export a single binary field tend to try to use COPY. We
have functionality for large objects that has no equivalent for
'bytea'.

I don't love the use of COPY for this, but it gets us support for
arbitrary clients pretty easily. Otherwise it'd be server-side only
via local filesystem access, or require special psql-specific
functionality like we have for lo_import etc.

The main point is that this is a real world thing. People want to do
it, try to do it, and have problems doing it. So it's a solution a
real issue.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2015-07-25 Thread Pavel Stehule

2015-07-23 22:05 GMT+02:00 Dickson S. Guedes lis...@guedesoft.net:

 2015-07-07 3:32 GMT-03:00 Pavel Stehule pavel.steh...@gmail.com:
 
  Hi
 
  previous patch was broken, and buggy
 
  Here is new version with fixed upload and more tests
 
  The interesting is so I should not to modify interface or client - so it
 should to work with any current driver with protocol support = 3.


Hi



 Hi Pavel,

 Here are some thoughts:

 1) from docs: only row data in network byte order are exported or
 imported.

 Should it be only raw data?


I don't understand well - it use a PostgreSQL buildin send functions -
and result of these functions is defined as in network byte order




 2) from docs Because this format doesn't support any delimiter, only
 one value  can be exported or imported. NULL values are not allowed.

 That only one value can be exported or imported is a little sad for
 someone with a table with more than one column that accepts bytea. The
 implemented feature doesn't covers the use-case where a table 'image'
 has columns: id integer, image bytea, thumbnail bytea, and I want to
 import binary data in that. We could put here the cases where we have
 NOT NULL columns. Since these are expected and the error messages
 complain about that couldn't them be covered in docs more explicitly?


This mode should not to replace current COPY binary mode. RAW binary output
for multiple fields is terrible complex task - you can use a fix length,
you can use some special separator etc. I remember a terrible complex
bulkload on Oracle or MSSQL - and I would to design it differently. I
prefer to have a COPY statement simple as possible - If you need
import/export all fields in record - then you can:

1. you can use a new LO api (for import) - load binary files as LO, INSERT
and drop used LO
2. call more COPY statements, and join exported files with operation system
tools (for export),
3. you can write specialized application that will support a COPY API and
export, import data in your preferred format.

The same complexity is with input, and I would not to write generic binary
files parser.




 3) from code: bool row_processed; /* true, when first row was processed
 */


in this mode is only one row - so first_row_processed sounds little bit
strange.



 Maybe rename the variable to something like `first_row_processed` and
 rip off the comment?

 4) from code:

 if (cstate-raw)
 format = 2;
 else if (cstate-binary)
 format = 1;
 else
 format = 0;

 Maybe create a constant for code readability?


good idea



 If by one side this feature does not covers a more generalized case,
 by other is a nice start, IMHO.


It is exactly what I don't would - the complexity of usage can go up to sky
with generic binary format file processing.

Regards

Pavel



 --
 Dickson S. Guedes
 mail/xmpp: gue...@guedesoft.net - skype: guediz
 http://github.com/guedes - http://guedesoft.net
 http://www.postgresql.org.br

Re: [HACKERS] raw output from copy

2015-07-23 Thread Dickson S. Guedes

2015-07-07 3:32 GMT-03:00 Pavel Stehule pavel.steh...@gmail.com:

 Hi

 previous patch was broken, and buggy

 Here is new version with fixed upload and more tests

 The interesting is so I should not to modify interface or client - so it 
 should to work with any current driver with protocol support = 3.

Hi Pavel,

Here are some thoughts:

1) from docs: only row data in network byte order are exported or imported.

Should it be only raw data?

2) from docs Because this format doesn't support any delimiter, only
one value  can be exported or imported. NULL values are not allowed.

That only one value can be exported or imported is a little sad for
someone with a table with more than one column that accepts bytea. The
implemented feature doesn't covers the use-case where a table 'image'
has columns: id integer, image bytea, thumbnail bytea, and I want to
import binary data in that. We could put here the cases where we have
NOT NULL columns. Since these are expected and the error messages
complain about that couldn't them be covered in docs more explicitly?

3) from code: bool row_processed; /* true, when first row was processed */

Maybe rename the variable to something like `first_row_processed` and
rip off the comment?

4) from code:

if (cstate-raw)
format = 2;
else if (cstate-binary)
format = 1;
else
format = 0;

Maybe create a constant for code readability?


If by one side this feature does not covers a more generalized case,
by other is a nice start, IMHO.

-- 
Dickson S. Guedes
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://github.com/guedes - http://guedesoft.net
http://www.postgresql.org.br


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2015-07-07 Thread Pavel Stehule

Hi

previous patch was broken, and buggy

Here is new version with fixed upload and more tests

The interesting is so I should not to modify interface or client - so it
should to work with any current driver with protocol support = 3.

Regards

Pavel



2015-07-06 23:34 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com:

 Hi

 here is a version with both direction support.

 postgres=# copy foo from '/tmp/1.jpg' (format raw);
 COPY 1
 Time: 93.021 ms
 postgres=# \dt+ foo
List of relations
 ┌┬──┬───┬───┬┬─┐
 │ Schema │ Name │ Type  │ Owner │  Size  │ Description │
 ╞╪══╪═══╪═══╪╪═╡
 │ public │ foo  │ table │ pavel │ 256 kB │ │
 └┴──┴───┴───┴┴─┘
 (1 row)

 postgres=# \copy foo to '~/3.jpg' (format raw)
 COPY 1
 Time: 2.401 ms

 Regards

 Pavel

 2015-07-02 17:02 GMT+02:00 Tom Lane t...@sss.pgh.pa.us:

 Andrew Dunstan and...@dunslane.net writes:
  Does the COPY line protocol even support binary data?

 The protocol, per se, just transmits a byte stream.  There is a field
 in the CopyInResponse/CopyOutResponse messages that indicates whether
 a text or binary copy is being done.  One thing we'd have to consider
 is whether raw mode is sufficiently different from binary to justify
 an additional value for this field, and if so whether that constitutes
 a protocol break.

 IIRC, psql wouldn't really care; it just transfers the byte stream to or
 from the target file, regardless of text or binary mode.  But there might
 be other client libraries that are smarter and expect binary mode to
 mean the binary file format specified in the COPY reference page.  So
 there may be value in being explicit about raw mode in these messages.

 A key point in all this is that people who need raw transfer probably
 need it in both directions, a point that your SELECT proposal cannot
 satisfy, but hacking COPY could.  So I lean towards the latter really.

 regards, tom lane



diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
new file mode 100644
index 2850b47..5739158
*** a/doc/src/sgml/ref/copy.sgml
--- b/doc/src/sgml/ref/copy.sgml
*** COPY { replaceable class=parameterta
*** 190,196 
Selects the data format to be read or written:
literaltext/,
literalcsv/ (Comma Separated Values),
!   or literalbinary/.
The default is literaltext/.
   /para
  /listitem
--- 190,196 
Selects the data format to be read or written:
literaltext/,
literalcsv/ (Comma Separated Values),
!   literalbinary/ or literalraw/literal.
The default is literaltext/.
   /para
  /listitem
*** OIDs to be shown as null if that ever pr
*** 881,886 
--- 881,918 
  /para
 /refsect3
/refsect2
+ 
+   refsect2
+  titleRaw Format/title
+ 
+para
+ The literalraw/literal format option causes all data to be
+ stored/read as binary format rather than as text. It shares format
+ for data with literalbinary/literal format. This format doesn't
+ use any metadata - only row data in network byte order are exported
+ or imported.
+/para
+ 
+para
+ Because this format doesn't support any delimiter, only one value
+ can be exported or imported. NULL values are not allowed.
+/para
+para
+ The literalraw/literal format can be used for export or import
+ bytea values.
+ programlisting
+ COPY images(data) FROM '/usr1/proj/img/01.jpg' (FORMAT raw);
+ /programlisting
+ It can be used successfully for export XML in different encoding
+ or import valid XML document with any supported encoding:
+ screen![CDATA[
+ SET client_encoding TO latin2;
+ 
+ COPY (SELECT xmlelement(NAME data, 'Hello')) TO stdout (FORMAT raw);
+ ?xml version=1.0 encoding=LATIN2?dataHello/data
+ ]]/screen
+/para
+   /refsect2
   /refsect1
  
   refsect1
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
new file mode 100644
index 8904676..c69854c
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*** typedef enum EolType
*** 92,97 
--- 92,102 
   * it's faster to make useless comparisons to trailing bytes than it is to
   * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is TRUE
   * when we have to do it the hard way.
+  *
+  * COPY supports three modes: text, binary and raw. The text format is plain
+  * text multiline format with specified delimiter. The binary format holds
+  * metadata (numbers, sizes) and data. The raw format holds data only and
+  * only one non NULL value can be processed.
   */
  typedef struct CopyStateData
  {
*** typedef struct CopyStateData
*** 113,118 
--- 118,124 
  	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
  	bool		is_program;		/* is 'filename' a program to popen? */
  	bool

Re: [HACKERS] raw output from copy

2015-07-06 Thread Pavel Stehule

Hi

here is a version with both direction support.

postgres=# copy foo from '/tmp/1.jpg' (format raw);
COPY 1
Time: 93.021 ms
postgres=# \dt+ foo
   List of relations
┌┬──┬───┬───┬┬─┐
│ Schema │ Name │ Type  │ Owner │  Size  │ Description │
╞╪══╪═══╪═══╪╪═╡
│ public │ foo  │ table │ pavel │ 256 kB │ │
└┴──┴───┴───┴┴─┘
(1 row)

postgres=# \copy foo to '~/3.jpg' (format raw)
COPY 1
Time: 2.401 ms

Regards

Pavel

2015-07-02 17:02 GMT+02:00 Tom Lane t...@sss.pgh.pa.us:

 Andrew Dunstan and...@dunslane.net writes:
  Does the COPY line protocol even support binary data?

 The protocol, per se, just transmits a byte stream.  There is a field
 in the CopyInResponse/CopyOutResponse messages that indicates whether
 a text or binary copy is being done.  One thing we'd have to consider
 is whether raw mode is sufficiently different from binary to justify
 an additional value for this field, and if so whether that constitutes
 a protocol break.

 IIRC, psql wouldn't really care; it just transfers the byte stream to or
 from the target file, regardless of text or binary mode.  But there might
 be other client libraries that are smarter and expect binary mode to
 mean the binary file format specified in the COPY reference page.  So
 there may be value in being explicit about raw mode in these messages.

 A key point in all this is that people who need raw transfer probably
 need it in both directions, a point that your SELECT proposal cannot
 satisfy, but hacking COPY could.  So I lean towards the latter really.

 regards, tom lane

commit 5599347d6b0b29a2674d465b3ff03164fce59810
Author: Pavel Stehule pavel.steh...@gooddata.com
Date:   Mon Jul 6 23:18:18 2015 +0200

COPY FROM/TO (FORMAT RAW)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 2850b47..4b7b64d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -190,7 +190,7 @@ COPY { replaceable class=parametertable_name/replaceable [ ( replaceable
   Selects the data format to be read or written:
   literaltext/,
   literalcsv/ (Comma Separated Values),
-  or literalbinary/.
+  literalbinary/ or literalraw/literal.
   The default is literaltext/.
  /para
 /listitem
@@ -881,6 +881,23 @@ OIDs to be shown as null if that ever proves desirable.
 /para
/refsect3
   /refsect2
+
+  refsect2
+ titleRaw Format/title
+
+   para
+The literalraw/literal format option causes all data to be
+stored/read as binary format rather than as text. It shares format
+for data with literalbinary/literal format. This format doesn't
+use any metadata - only row data in network byte order are exported
+or imported.
+   /para
+
+   para
+Because this format doesn't support any delimiter, only one value
+can be exported or imported. NULL values are not allowed.
+   /para
+  /refsect2
  /refsect1
 
  refsect1
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 8904676..2ad7eb1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -92,6 +92,11 @@ typedef enum EolType
  * it's faster to make useless comparisons to trailing bytes than it is to
  * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is TRUE
  * when we have to do it the hard way.
+ *
+ * COPY supports three modes: text, binary and raw. The text format is plain
+ * text multiline format with specified delimiter. The binary format holds
+ * metadata (numbers, sizes) and data. The raw format holds data only and
+ * only one non NULL value can be processed.
  */
 typedef struct CopyStateData
 {
@@ -113,6 +118,7 @@ typedef struct CopyStateData
 	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 	bool		binary;			/* binary format? */
+	bool		raw;			/* required raw binary? */
 	bool		oids;			/* include OIDs? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
@@ -202,6 +208,9 @@ typedef struct CopyStateData
 	char	   *raw_buf;
 	int			raw_buf_index;	/* next byte to process */
 	int			raw_buf_len;	/* total # of bytes stored */
+
+	/* field for RAW mode */
+	bool		row_processed;		/* true, when first row was processed */
 } CopyStateData;
 
 /* DestReceiver for COPY (SELECT) TO */
@@ -345,9 +354,16 @@ SendCopyBegin(CopyState cstate)
 		/* new way */
 		StringInfoData buf;
 		int			natts = list_length(cstate-attnumlist);
-		int16		format = (cstate-binary ? 1 : 0);
+		int16		format;
 		int			i;
 
+		if (cstate-raw)
+			format = 2;
+		else if (cstate-binary)
+			format = 1;
+		else
+			format = 0;
+
 		pq_beginmessage(buf, 'H');
 		pq_sendbyte(buf, format);		/* overall format */
 		pq_sendint(buf, natts, 2);
@@ -359,7 +375,7 @@ SendCopyBegin(CopyState cstate)
 	else if

Re: [HACKERS] raw output from copy

2015-07-05 Thread Pavel Stehule

2015-07-02 17:02 GMT+02:00 Tom Lane t...@sss.pgh.pa.us:

 Andrew Dunstan and...@dunslane.net writes:
  Does the COPY line protocol even support binary data?

 The protocol, per se, just transmits a byte stream.  There is a field
 in the CopyInResponse/CopyOutResponse messages that indicates whether
 a text or binary copy is being done.  One thing we'd have to consider
 is whether raw mode is sufficiently different from binary to justify
 an additional value for this field, and if so whether that constitutes
 a protocol break. sql/plpgsql_check_passive-9.6.sql


 IIRC, psql wouldn't really care; it just transfers the byte stream to or
 from the target file, regardless of text or binary mode.  But there might
 be other client libraries that are smarter and expect binary mode to
 mean the binary file format specified in the COPY reference page.  So
 there may be value in being explicit about raw mode in these messages.


The safe way is create new mode and propagate it on client. It should to
not break any current applications, because no one uses COPY RAW.



 A key point in all this is that people who need raw transfer probably
 need it in both directions, a point that your SELECT proposal cannot
 satisfy, but hacking COPY could.  So I lean towards the latter really.


yes, it has sense. I am not sure, if I'll have time to implement it in this
step, but I'll look on it.

regards

Pavel



 regards, tom lane

Re: [HACKERS] raw output from copy

2015-07-02 Thread Pavel Stehule

Hi

I'll do it today evening

Pavel

2015-07-02 12:55 GMT+02:00 Simon Riggs si...@2ndquadrant.com:

 On 1 July 2015 at 07:42, Pavel Golub pa...@microolap.com wrote:


 I looked through the patch. Sources are OK. However I didn't find any
 docs and test cases. Would you please provide me with short description on
 this feature and why it is important. Because I didn't manage to find the
 old Andrew Dunstan's post either.


 Feature sounds OK, so lets do it.

 Pavel S, please submit a polished patch. Coding guidelines, tests, docs
 etc. Set back to Waiting On Author.

 --
 Simon Riggshttp://www.2ndQuadrant.com/
 http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] raw output from copy

2015-07-02 Thread Andrew Dunstan



On 07/02/2015 07:14 AM, Pavel Stehule wrote:

Hi

I'll do it today evening




Pavel,

Please don't top-post on the PostgreSQL lists. You've been around here 
long enough to know that bottom posting is our custom.


I posted a patch for this in 2013 at 
http://www.postgresql.org/message-id/50f2fa92.9040...@dunslane.net but 
it can apply to a SELECT, and doesn't need COPY. Nobody seemed very 
interested, so I dropped it. Apparently people now want something along 
these lines, which is good.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2015-07-02 Thread Andrew Dunstan



On 07/02/2015 09:43 AM, Simon Riggs wrote:
On 2 July 2015 at 14:02, Andrew Dunstan and...@dunslane.net 
mailto:and...@dunslane.net wrote:



Please don't top-post on the PostgreSQL lists. You've been around
here long enough to know that bottom posting is our custom.

I posted a patch for this in 2013 at
http://www.postgresql.org/message-id/50f2fa92.9040...@dunslane.net
but it can apply to a SELECT, and doesn't need COPY. Nobody seemed
very interested, so I dropped it. Apparently people now want
something along these lines, which is good.


It's a shame that both solutions are restricted to either COPY or psql.

Both of those are working on suggestions from Tom, so there is no 
history of preference there.


Can we have both please, gentlemen?

If we implemented Andrew's solution, how would we request it in a COPY 
statement? Seems like we would want the RAW format keyword anyway.






What's the use case? My original motivation was that I had a function 
that returned a bytea (it was a PDF in fact) that I wanted to be able to 
write to a file. Of course, this is easy enough to do with a client 
library like perl's DBD::Pg, but it seems sad to have to resort to that 
for something so simple.


My original suggestion 
(http://www.postgresql.org/message-id/4ea1b83b.2050...@pgexperts.com) 
was to invent a \bcopy command.


I don't have a problem in building in a RAW mode for copy, but we'll 
still need to teach psql how to deal with it.


Another case where it could be useful is JSON - so we can avoid having 
to play tricks like 
http://adpgtech.blogspot.com/2014/09/importing-json-data.html. Similar 
considerations probably apply to XML, and the tricks are less guaranteed 
to work.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2015-07-02 Thread Pavel Stehule

2015-07-02 16:02 GMT+02:00 Andrew Dunstan and...@dunslane.net:


 On 07/02/2015 09:43 AM, Simon Riggs wrote:

 On 2 July 2015 at 14:02, Andrew Dunstan and...@dunslane.net mailto:
 and...@dunslane.net wrote:


 Please don't top-post on the PostgreSQL lists. You've been around
 here long enough to know that bottom posting is our custom.

 I posted a patch for this in 2013 at
 http://www.postgresql.org/message-id/50f2fa92.9040...@dunslane.net
 but it can apply to a SELECT, and doesn't need COPY. Nobody seemed
 very interested, so I dropped it. Apparently people now want
 something along these lines, which is good.


 It's a shame that both solutions are restricted to either COPY or psql.

 Both of those are working on suggestions from Tom, so there is no history
 of preference there.

 Can we have both please, gentlemen?

 If we implemented Andrew's solution, how would we request it in a COPY
 statement? Seems like we would want the RAW format keyword anyway.




 What's the use case? My original motivation was that I had a function that
 returned a bytea (it was a PDF in fact) that I wanted to be able to write
 to a file. Of course, this is easy enough to do with a client library like
 perl's DBD::Pg, but it seems sad to have to resort to that for something so
 simple.

 My original suggestion (
 http://www.postgresql.org/message-id/4ea1b83b.2050...@pgexperts.com) was
 to invent a \bcopy command.

 I don't have a problem in building in a RAW mode for copy, but we'll still
 need to teach psql how to deal with it.


It can be used from psql without any problems.



 Another case where it could be useful is JSON - so we can avoid having to
 play tricks like 
 http://adpgtech.blogspot.com/2014/09/importing-json-data.html. Similar
 considerations probably apply to XML, and the tricks are less guaranteed to
 work.

 cheers

 andrew

Re: [HACKERS] raw output from copy

2015-07-02 Thread Andrew Dunstan



On 07/02/2015 09:02 AM, Andrew Dunstan wrote:


On 07/02/2015 07:14 AM, Pavel Stehule wrote:

Hi

I'll do it today evening




Pavel,

Please don't top-post on the PostgreSQL lists. You've been around here 
long enough to know that bottom posting is our custom.


I posted a patch for this in 2013 at 
http://www.postgresql.org/message-id/50f2fa92.9040...@dunslane.net 
but it can apply to a SELECT, and doesn't need COPY. Nobody seemed 
very interested, so I dropped it. Apparently people now want something 
along these lines, which is good.



For reference, here's the Wayback Machine's version of the original blog 
post referred to: 
http://web.archive.org/web/20110916023912/http://people.planetpostgresql.org/andrew/index.php?/archives/196-Clever-trick-challenge.html


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2015-07-02 Thread Simon Riggs

On 2 July 2015 at 14:02, Andrew Dunstan and...@dunslane.net wrote:


 Please don't top-post on the PostgreSQL lists. You've been around here
 long enough to know that bottom posting is our custom.

 I posted a patch for this in 2013 at 
 http://www.postgresql.org/message-id/50f2fa92.9040...@dunslane.net but
 it can apply to a SELECT, and doesn't need COPY. Nobody seemed very
 interested, so I dropped it. Apparently people now want something along
 these lines, which is good.


It's a shame that both solutions are restricted to either COPY or psql.

Both of those are working on suggestions from Tom, so there is no history
of preference there.

Can we have both please, gentlemen?

If we implemented Andrew's solution, how would we request it in a COPY
statement? Seems like we would want the RAW format keyword anyway.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] raw output from copy

2015-07-02 Thread Pavel Stehule

2015-07-02 15:43 GMT+02:00 Simon Riggs si...@2ndquadrant.com:

 On 2 July 2015 at 14:02, Andrew Dunstan and...@dunslane.net wrote:


 Please don't top-post on the PostgreSQL lists. You've been around here
 long enough to know that bottom posting is our custom.

 I posted a patch for this in 2013 at 
 http://www.postgresql.org/message-id/50f2fa92.9040...@dunslane.net but
 it can apply to a SELECT, and doesn't need COPY. Nobody seemed very
 interested, so I dropped it. Apparently people now want something along
 these lines, which is good.


 It's a shame that both solutions are restricted to either COPY or psql.

 Both of those are working on suggestions from Tom, so there is no history
 of preference there.

 Can we have both please, gentlemen?

 If we implemented Andrew's solution, how would we request it in a COPY
 statement? Seems like we would want the RAW format keyword anyway.


I prefer a COPY like solution - it can be used on both sides (server,
client), and it can be used little bit simply for psql -c XXX pattern.

Regards

Pavel



 --
 Simon Riggshttp://www.2ndQuadrant.com/
 http://www.2ndquadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] raw output from copy

2015-07-02 Thread Andrew Dunstan



On 07/02/2015 11:02 AM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

Does the COPY line protocol even support binary data?

The protocol, per se, just transmits a byte stream.  There is a field
in the CopyInResponse/CopyOutResponse messages that indicates whether
a text or binary copy is being done.  One thing we'd have to consider
is whether raw mode is sufficiently different from binary to justify
an additional value for this field, and if so whether that constitutes
a protocol break.

IIRC, psql wouldn't really care; it just transfers the byte stream to or
from the target file, regardless of text or binary mode.  But there might
be other client libraries that are smarter and expect binary mode to
mean the binary file format specified in the COPY reference page.  So
there may be value in being explicit about raw mode in these messages.

A key point in all this is that people who need raw transfer probably
need it in both directions, a point that your SELECT proposal cannot
satisfy, but hacking COPY could.  So I lean towards the latter really.





OK, let's do that. I await the result with interest.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2015-07-02 Thread Andrew Dunstan



On 07/02/2015 10:07 AM, Pavel Stehule wrote:



2015-07-02 16:02 GMT+02:00 Andrew Dunstan and...@dunslane.net 
mailto:and...@dunslane.net:



On 07/02/2015 09:43 AM, Simon Riggs wrote:

On 2 July 2015 at 14:02, Andrew Dunstan and...@dunslane.net
mailto:and...@dunslane.net mailto:and...@dunslane.net
mailto:and...@dunslane.net wrote:


Please don't top-post on the PostgreSQL lists. You've been
around
here long enough to know that bottom posting is our custom.

I posted a patch for this in 2013 at
   
http://www.postgresql.org/message-id/50f2fa92.9040...@dunslane.net

but it can apply to a SELECT, and doesn't need COPY.
Nobody seemed
very interested, so I dropped it. Apparently people now want
something along these lines, which is good.


It's a shame that both solutions are restricted to either COPY
or psql.

Both of those are working on suggestions from Tom, so there is
no history of preference there.

Can we have both please, gentlemen?

If we implemented Andrew's solution, how would we request it
in a COPY statement? Seems like we would want the RAW format
keyword anyway.




What's the use case? My original motivation was that I had a
function that returned a bytea (it was a PDF in fact) that I
wanted to be able to write to a file. Of course, this is easy
enough to do with a client library like perl's DBD::Pg, but it
seems sad to have to resort to that for something so simple.

My original suggestion
(http://www.postgresql.org/message-id/4ea1b83b.2050...@pgexperts.com)
was to invent a \bcopy command.

I don't have a problem in building in a RAW mode for copy, but
we'll still need to teach psql how to deal with it.


It can be used from psql without any problems.



In fact your patch will not work with psql's \copy nor to stdout at all, 
unless I'm misreading it:


   -if (cstate-binary)
   +if (cstate-binary || cstate-raw)
 ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg(COPY BINARY is not supported to stdout or from
   stdin)));


So it looks like you're only supporting this where the server is writing 
to a file. That's horribly narrow, and certainly doesn't meet my 
original need.


Does the COPY line protocol even support binary data? If not, we're dead 
in the water here from the psql POV. Because my patch doesn't use the 
COPY protocol it doesn't have this problem.


Perhaps we should do both, although I'm not sure I understand the use 
case for the COPY solution.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2015-07-02 Thread Simon Riggs

On 2 July 2015 at 15:07, Pavel Stehule pavel.steh...@gmail.com wrote:


 It can be used from psql without any problems.


It can, but your patch does not yet do that, while Andrew's does.

We want a solution that works from psql and other clients. Hopefully the
same-ish solution.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] raw output from copy

2015-07-01 Thread Pavel Golub

Hello Pavel.

I looked through the patch. Sources are OK. However I didn't find any docs
and test cases. Would you please provide me with short description on this
feature and why it is important. Because I didn't manage to find the old
Andrew Dunstan's post either.

On Sat, Apr 11, 2015 at 12:26 AM, Pavel Stehule pavel.steh...@gmail.com
wrote:

 Hi

 I wrote a prototype of this patch, and it works well

 postgres=# set client_encoding to 'latin2';
 SET
 Time: 1.488 ms
 postgres=# \copy (select xmlelement(name xx, d) from d) to ~/d.xml (format
 'raw')
 COPY 1
 Time: 1.108 ms
 postgres=# copy (select xmlelement(name xx, d) from d) to stdout (format
 'raw') ;
 ?xml version=1.0 encoding=LATIN2?xxpříliš žluťoučký kůň/xxTime:
 1.000 ms

 Regards

 Pavel

 2015-04-09 20:48 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com:

 Hi

 This thread was finished without real work. I have a real use case -
 export XML doc in non utf8 encoding.

 http://www.postgresql.org/message-id/16174.1319228...@sss.pgh.pa.us

 I propose to implement new format option RAW like Tom proposed.

 It requires only one row, one column result - and result is just raw
 binary data without size.

 Objections? Ideas?

 Regards

 Pavel




 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Nullus est in vitae sensus ipsa vera est sensus.

Re: [HACKERS] raw output from copy

2015-04-15 Thread Peter Eisentraut

On 4/10/15 5:26 PM, Pavel Stehule wrote:
 Hi
 
 I wrote a prototype of this patch, and it works well
 
 postgres=# set client_encoding to 'latin2';
 SET
 Time: 1.488 ms
 postgres=# \copy (select xmlelement(name xx, d) from d) to ~/d.xml
 (format 'raw')
 COPY 1
 Time: 1.108 ms
 postgres=# copy (select xmlelement(name xx, d) from d) to stdout (format
 'raw') ;
 ?xml version=1.0 encoding=LATIN2?xxpříliš žluťoučký
 kůň/xxTime: 1.000 ms

I think you can get the same thing using regular psql output and just
turning off all field and record separators and tuple headers and so on.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] raw output from copy

2015-04-15 Thread Pavel Stehule

It would be nice, but it is not true. You can get correct non utf8 xml with
encoding specification only when binary mode is used. Psql doesn't support
binary mode.

Regards

Pavel
Dne 15. 4. 2015 22:06 napsal uživatel Peter Eisentraut pete...@gmx.net:

 On 4/10/15 5:26 PM, Pavel Stehule wrote:
  Hi
 
  I wrote a prototype of this patch, and it works well
 
  postgres=# set client_encoding to 'latin2';
  SET
  Time: 1.488 ms
  postgres=# \copy (select xmlelement(name xx, d) from d) to ~/d.xml
  (format 'raw')
  COPY 1
  Time: 1.108 ms
  postgres=# copy (select xmlelement(name xx, d) from d) to stdout (format
  'raw') ;
  ?xml version=1.0 encoding=LATIN2?xxpříliš žluťoučký
  kůň/xxTime: 1.000 ms

 I think you can get the same thing using regular psql output and just
 turning off all field and record separators and tuple headers and so on.

Re: [HACKERS] raw output from copy

2015-04-10 Thread Pavel Stehule

Hi

I wrote a prototype of this patch, and it works well

postgres=# set client_encoding to 'latin2';
SET
Time: 1.488 ms
postgres=# \copy (select xmlelement(name xx, d) from d) to ~/d.xml (format
'raw')
COPY 1
Time: 1.108 ms
postgres=# copy (select xmlelement(name xx, d) from d) to stdout (format
'raw') ;
?xml version=1.0 encoding=LATIN2?xxpříliš žluťoučký kůň/xxTime:
1.000 ms

Regards

Pavel

2015-04-09 20:48 GMT+02:00 Pavel Stehule pavel.steh...@gmail.com:

 Hi

 This thread was finished without real work. I have a real use case -
 export XML doc in non utf8 encoding.

 http://www.postgresql.org/message-id/16174.1319228...@sss.pgh.pa.us

 I propose to implement new format option RAW like Tom proposed.

 It requires only one row, one column result - and result is just raw
 binary data without size.

 Objections? Ideas?

 Regards

 Pavel

commit 60c6701fe5a91c41e9ed0db99676c8b1a27e85e3
Author: Pavel Stehule pavel.steh...@gooddata.com
Date:   Fri Apr 10 23:22:39 2015 +0200

initial

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 92ff632..5701f8b 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -113,6 +113,7 @@ typedef struct CopyStateData
 	char	   *filename;		/* filename, or NULL for STDIN/STDOUT */
 	bool		is_program;		/* is 'filename' a program to popen? */
 	bool		binary;			/* binary format? */
+	bool		raw;			/* raw format - data only */
 	bool		oids;			/* include OIDs? */
 	bool		freeze;			/* freeze rows on loading? */
 	bool		csv_mode;		/* Comma Separated Value format? */
@@ -348,6 +349,13 @@ SendCopyBegin(CopyState cstate)
 		int16		format = (cstate-binary ? 1 : 0);
 		int			i;
 
+		if (cstate-binary)
+			format = 1;
+		else if (cstate-raw)
+			format = 2;
+		else
+			format = 0;
+
 		pq_beginmessage(buf, 'H');
 		pq_sendbyte(buf, format);		/* overall format */
 		pq_sendint(buf, natts, 2);
@@ -359,7 +367,7 @@ SendCopyBegin(CopyState cstate)
 	else if (PG_PROTOCOL_MAJOR(FrontendProtocol) = 2)
 	{
 		/* old way */
-		if (cstate-binary)
+		if (cstate-binary || cstate-raw)
 			ereport(ERROR,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 			errmsg(COPY BINARY is not supported to stdout or from stdin)));
@@ -371,7 +379,7 @@ SendCopyBegin(CopyState cstate)
 	else
 	{
 		/* very old way */
-		if (cstate-binary)
+		if (cstate-binary || cstate-raw)
 			ereport(ERROR,
 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 			errmsg(COPY BINARY is not supported to stdout or from stdin)));
@@ -485,7 +493,7 @@ CopySendEndOfRow(CopyState cstate)
 	switch (cstate-copy_dest)
 	{
 		case COPY_FILE:
-			if (!cstate-binary)
+			if (!(cstate-binary || cstate-raw))
 			{
 /* Default line termination depends on platform */
 #ifndef WIN32
@@ -543,7 +551,7 @@ CopySendEndOfRow(CopyState cstate)
 			break;
 		case COPY_NEW_FE:
 			/* The FE/BE protocol uses \n as newline for all platforms */
-			if (!cstate-binary)
+			if (!(cstate-binary || cstate-raw))
 CopySendChar(cstate, '\n');
 
 			/* Dump the accumulated row as one CopyData message */
@@ -1005,6 +1013,8 @@ ProcessCopyOptions(CopyState cstate,
 cstate-csv_mode = true;
 			else if (strcmp(fmt, binary) == 0)
 cstate-binary = true;
+			else if (strcmp(fmt, raw) == 0)
+cstate-raw = true;
 			else
 ereport(ERROR,
 		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -1808,6 +1818,10 @@ CopyTo(CopyState cstate)
 	num_phys_attrs = tupDesc-natts;
 	cstate-null_print_client = cstate-null_print;		/* default */
 
+	/* don't allow more columns for raw format */
+	if (tupDesc-natts  1)
+		elog(ERROR, too much columns for RAW output);
+
 	/* We use fe_msgbuf as a per-row buffer regardless of copy_dest */
 	cstate-fe_msgbuf = makeStringInfo();
 
@@ -1819,7 +1833,7 @@ CopyTo(CopyState cstate)
 		Oid			out_func_oid;
 		bool		isvarlena;
 
-		if (cstate-binary)
+		if ((cstate-binary || cstate-raw))
 			getTypeBinaryOutputInfo(attr[attnum - 1]-atttypid,
 	out_func_oid,
 	isvarlena);
@@ -1858,7 +1872,7 @@ CopyTo(CopyState cstate)
 		tmp = 0;
 		CopySendInt32(cstate, tmp);
 	}
-	else
+	else if (!cstate-raw)
 	{
 		/*
 		 * For non-binary copy, we need to convert null_print to file
@@ -1970,7 +1984,7 @@ CopyOneRowTo(CopyState cstate, Oid tupleOid, Datum *values, bool *nulls)
 			CopySendInt32(cstate, tupleOid);
 		}
 	}
-	else
+	else if (!cstate-raw)
 	{
 		/* Text format has no per-tuple header, but send OID if wanted */
 		/* Assume digits don't need any quoting or encoding conversion */
@@ -1998,14 +2012,16 @@ CopyOneRowTo(CopyState cstate, Oid tupleOid, Datum *values, bool *nulls)
 
 		if (isnull)
 		{
-			if (!cstate-binary)
+			if (cstate-raw)
+elog(ERROR, cannot to push NULL in raw output);
+			else if (!cstate-binary)
 CopySendString(cstate, cstate-null_print_client);
 			else
 CopySendInt32(cstate, -1);
 		}
 		else
 		{
-			if (!cstate-binary)
+			if (!(cstate-binary || cstate-raw))
 			{
 string = OutputFunctionCall(out_functions[attnum - 1],
 			value);
diff

[HACKERS] raw output from copy

2015-04-09 Thread Pavel Stehule

Hi

This thread was finished without real work. I have a real use case - export
XML doc in non utf8 encoding.

http://www.postgresql.org/message-id/16174.1319228...@sss.pgh.pa.us

I propose to implement new format option RAW like Tom proposed.

It requires only one row, one column result - and result is just raw binary
data without size.

Objections? Ideas?

Regards

Pavel

78 matches

Mail list logo