Re: [GENERAL] Why copy ... from stdio does not return immediately when reading invalid data?

2011-02-07 Thread Nicolas Grilly
I have analyzed the PostgreSQL protocol using Wireshark (an open source
packet analyzer), and I observed that the PostgreSQL backend, while doing a
COPY ... FROM STDIN, reports errors as soon as possible (especially errors
related to invalid data).

Therefore, the late reporting of errors while doing a COPY ... FROM STDIN
is not a limitation of the underlying protocol; it is a limitation (or a
design choice) of the libpq library.

It looks like this is a well known issue because it is listed on the todo
list:
http://wiki.postgresql.org/wiki/Todo#COPY

And was discussed before:
http://archives.postgresql.org/pgsql-hackers/2008-04/msg01169.php

Do you think it is possible to change that behavior, or work around it?

While reading libpq source code, I noticed the function pqParseInput3 (file
fe-protocol3.c) ignores error responses while the connection is
in PGASYNC_COPY_IN state. Maybe we can make a special case for the COPY FROM
subprotocol and handle errors early, in order to make them available to
PQgetResult? Is is feasible in a simple way or is it a bad idea?

Regards,

Nicolas Grilly

On Wed, Feb 2, 2011 at 20:06, John R Pierce pie...@hogranch.com wrote:

 On 02/02/11 10:20 AM, Nicolas Grilly wrote:

 Is the copy protocol (aka PQputCopyData and PQputCopyEnd) designed to send
 gigabytes of data with just one copy ... from stdio query, and is there a
 way to be notified of a potential error before calling PQputCopyEnd? Or do I
 have to send my data in small chunks (for example batch of 1 rows),
 issue a PQputCopyEnd, check for errors, and continue with the next chunk?


 I would batch the data, maybe 1000 lines or even 100 lines at a time if
 these errors are at all frequent.  put the errored batches in an exception
 list or something so you can sort them out later.



[GENERAL] Why copy ... from stdio does not return immediately when reading invalid data?

2011-02-02 Thread Nicolas Grilly
Hello,

I am importing gigabytes of data into PostgreSQL, and I don't want to wait
10 minutes just to discover an error in the 10th line of my input file.

I tried the command \copy ... from stdio in psql and it looks like psql
has to read the entire input before returning a potential error, even if the
invalid value is in one of the first rows.

Is it a limitation of PostgreSQL protocol, of the library lipq, or of the
tool psql?

Is the copy protocol (aka PQputCopyData and PQputCopyEnd) designed to send
gigabytes of data with just one copy ... from stdio query, and is there a
way to be notified of a potential error before calling PQputCopyEnd? Or do I
have to send my data in small chunks (for example batch of 1
rows), issue a PQputCopyEnd, check for errors, and continue with the next
chunk?

Thanks for your help and advice.

Regards,

Nicolas Grilly


Re: [GENERAL] Why copy ... from stdio does not return immediately when reading invalid data?

2011-02-02 Thread John R Pierce

On 02/02/11 10:20 AM, Nicolas Grilly wrote:
Is the copy protocol (aka PQputCopyData and PQputCopyEnd) designed to 
send gigabytes of data with just one copy ... from stdio query, and 
is there a way to be notified of a potential error before calling 
PQputCopyEnd? Or do I have to send my data in small chunks (for 
example batch of 1 rows), issue a PQputCopyEnd, check for errors, 
and continue with the next chunk?


I would batch the data, maybe 1000 lines or even 100 lines at a time if 
these errors are at all frequent.  put the errored batches in an 
exception list or something so you can sort them out later.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general